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ABSTRACT 

Consider the problem of verifying security properties of a 
cryptographic protocol coded in C. We propose an automatic 
solution that needs neither a pre-existing protocol descrip- 
tion nor manual annotation of source code. First, symboli- 
cally execute the C program to obtain symbolic descriptions 
for the network messages sent by the protocol. Second, ap- 
ply algebraic rewriting to obtain a process calculus descrip- 
tion. Third, run an existing protocol analyser (ProVerif) 
to prove security properties or find attacks. We formalise 
our algorithm and appeal to existing results for ProVerif to 
establish computational soundness under suitable circum- 
stances. We analyse only a single execution path, so our 
results are limited to protocols with no significant branch- 
ing. The results in this paper provide the first computation- 
ally sound verification of weak secrecy and authentication 
for (single execution paths of) C code. 

1. INTRODUCTION 

Recent years have seen great progress in formal verifica- 
tion of cryptographic protocols, as illustrated by powerful 
tools like ProVerif [13], CryptoVerif [12] or AVISPA [3]. 
There remains, however, a large gap between what we verify 
(formal descriptions of protocols, say, in the pi calculus) and 
what we rely on (protocol implementations, often in low- 
level languages like C). The need to start the verification 
from C code has been recognised before and implemented 
in tools hke CSur [26] and ASPfER [18], but the methods 
proposed there are still rather limited. Consider, for exam- 
ple, the small piece of C code in fig. 1 that checks whether 
a message received from the network matches a message au- 
thentication code. Intuitively, if the key is honestly chosen 
and kept secret from the attacker then with overwhelming 
probability the event will be triggered only if another honest 
participant (with access to the key) generated the message. 
Unfortunately, previous approaches cannot prove this prop- 
erty: the analysis of CSur is too coarse to deal with authenti- 
cation properties like this and ASPIER cannot directly deal 
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void * key; size_t keylen ; 

readenv("k", &key , &keylen ) ; 

size_t len ; 

read(&len, sizeof ( Icn ) ) ; 

if (len > 1000) exit (); 

void * buf = malloc(len + 2 * MACLLEN) ; 

read ( buf , len ) ; 

mac ( buf , len , key , keylen , buf -|- len ) ; 

read (buf -|- len -|- MAC_LEN, MAC_LEN) ; 

if (niemcnip( buf + len , 

buf + len + MAC_LEN, 
MACXEN) = 0) 
event (" accept " , buf, len); 

in(a::i); in(a::2); if a;2 = mac{k,xi) then event accept{xi) 

Figure 1: An example C fragment together with the 
extracted model. 

with code manipulating memory through pointers. Further- 
more the previous works do not offer a definition of security 
directly for C code, i.e. they do not formally state what it 
means for a C program to satisfy a security property, which 
makes it difficult to evaluate their overall soundness. The 
goal of our work is to improve upon this situation by giv- 
ing a formal definition of security straight for C code and 
proposing a nrethod that can verify secrecy and authentica- 
tion for typical memory-manipulating implenrentations like 
the one in fig. 1 in a fully automatic and scalable manner, 
without relying on a pre-existing protocol specification. 

Our method proceeds by extracting a high-level model 
from the C code that can then be verified using existing 
tools (we use ProVerif in our work). Currently we restrict 
our analysis to code in which all network outputs happen on 
a single execution path, but otherwise we do not require use 
of any specific programming style, with the aim of applying 
our methods to legacy implementations. In particular, we 
do not assume memory safety, but instead explicitly verify 
it during nrodel extraction. The method still assumes that 
the cryptographic primitives such as encryption or hashing 
are implemented correctly — verification of these is difficult 
even when done manually [2]. 

The two main contributions of our work are: 

• formal definition of security properties for source code; 

• an algorithm that computes a high-level model of the 
protocol implemented by a C program. 

We implement and evaluate the algorithm as well as give a 
proof of its soundness with respect to our security definition. 
Our definition of security for source code is given by linking 
the semantics of a programming language, expressed as a 



transition system, to a computational security definition in 
the spirit of [15, 25, 41]. We allow an arbitrary number 
of sessions. We restrict our definition to trace properties 
(such as weak secrecy or authentication), but do not consider 
observational equivalence (for strong secrecy, say). 

Due to the complexity of the C language we give the 
formal semantics for a simple assembler-like language into 
which C code can be easily compiled, as in other symbolic 
execution approaches such as [19]. The soundness of this 
step can be obtained by using well-known methods, as out- 
lined in section 3. 

Our model-extraction algorithm produces a model in an 
intermediate language without memory access or destruc- 
tive updates, while still preserving our security definition. 
The algorithm is based on symbolic execution [30] of the 
C program, using symbolic expressions to over-approximate 
the sets of values that may be stored in memory during 
concrete execution. The main difference from existing sym- 
bolic execution algorithms (such as [17] or [24]) is that our 
variables represent bitstrings of potentially unknown length, 
whereas in previous algorithms a single variable corresponds 
to a single byte. 

We show how the extracted models can be further sim- 
plified into the form understood by ProVerif. We apply the 
computational soundness result from [4] to obtain conditions 
where the symbolic security definition checked by ProVerif 
corresponds to our computational security definition. Com- 
bined with the security-preserving property of the model 
extraction algorithm this provides a computationally sound 
verification of weak secrecy and authentication for C. 

Outline of our Method. The verification proceeds in sev- 
eral steps, as outlined in fig. 2. The method takes as input: 

• the C implementations of the protocol participants, 
containing calls to a special function event as in fig. 1, 

• an environment process (in the modelling language) 
which spawns the participants, distributes keys, etc., 

• symbolic models of cryptographic functions used by 
the implementation, 

• a property that event traces in the execution are sup- 
posed to satisfy with overwhelming probability. 

We start by compiling the program down to a simple 
stack-based instruction language (CVM) using CIL [34] to 
parse and simplify the C input. The syntax and semantics 
of CVM are presented in section 2 and the translation from 
C to CVM is informally described in section 3. 

In the next step we symbolically execute CVM programs 
to eliminate memory accesses and destructive updates, thus 
obtaining an equivalent program in an intermediate model 
language (IML) — a version of the applied pi calculus ex- 
tended with bitstring manipulation primitives. For each al- 
located memory area the symbolic execution stores an ex- 
pression describing how the contents of the memory area 
have been computed. For instance a certain memory area 
might be associated with an expression hmac{01\x, k), where 
X is known to originate from the network, k is known to be 
an environment variable, and [ denotes concatenation. The 
symbolic execution does not enter the functions that imple- 
ment the cryptographic primitives, it uses the provided sym- 
bolic models instead. These models thus form the trusted 
base of the verification. An example of the symbolic execu- 
tion output is shown at the bottom of fig. 1. We define the 
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Figure 2: An outline of the method 

syntax and semantics of IML in section 4 and describe the 
symbolic execution in section 6. 

Our definition of security for source code is given in sec- 
tion 5. The definition is generic in that it does not assume 
a particular programming language. We simply require that 
the semantics of a language is given as a set of transitions of 
a certain form, and define a computational execution of the 
resulting transition system in the presence of an attacker 
and the corresponding notion of security. This allows one 
to apply the same security definition to protocols expressed 
both in the low-level implementation language and in the 
high-level model-description language, and to formulate a 
correspondence between the two. 

Given that the transition systems generated by different 
languages are required to be of the same form, we can mix 
them in the same execution. This allows us to use CVM to 
specify a single executing participant, but at the same time 
use IML to describe an environment process that spawns 
multiple participants and allows them to interact. In par- 
ticular, CVM need not be concerned with concurrency, thus 
making symbolic execution easier. Given an environment 
process Pe with n holes, we write Pe[Pi, ■ ■ ■ , Pn] for a pro- 
cess where the ith hole is filled with Pi, which can be either a 
CVM or an IML process. The soundness result for symbolic 
execution (theorem 1) states that if Pi, . . . , P„ are CVM pro- 
cesses and Pi, . . . ,Pn are IML models resulting from their 
symbolic execution then for any environment process Pe the 
security of Pe[Pi, . . . ,Pn] with respect to a trace property 
p relates to the security of Pe[Pi, ■ ■ ■ , Pn] with respect to p. 

To verify the security of an IML process, we replace its 
bitstring-manipulating expressions by applications of con- 
structor and destructor functions, thus obtaining a process 
in the applied pi-calculus (the version proposed in [14] and 
augmented with events). We can then apply a computa- 
tional soundness result, such as the one from [4], to specify 
conditions under which such a substitution is computation- 
ally sound: if the resulting pi calculus process is secure in a 
symbolic model (as can be checked by ProVerif) then it is 
asymptotically secure with respect to our computational no- 
tion of security. The correctness of translation from IML to 
pi is captured by theorem 2 and the computational sound- 
ness for resulting pi processes is captured by theorem 3. The 
verification of IML (and these two theorems in particular) 
is described in section 7. 



Theoretical and Practical Evaluation. Theorems 1 to 3 
establish the correctness of our approach. In a nutshell, their 
significance is as follows: given implementations Pi, . . . , P„ 
of protocol participants in CVM, which are automatically 
obtained from the corresponding C code, and an IML pro- 
cess Pe that describes an execution environment, if Pi , . . . , P„ 
are successfully symbolically executed with resulting models 
Pi, • • • , Pi, the IML process P£;[Pi, • ■ • , Pi] is successfully 
translated to a pi process P-^, and ProVerif successfully ver- 
ifies Ptt against a trace property p then Pi , . . . , P„ form a 
secure protocol implementation with respect to the environ- 
ment Pe and property p. 

We are aiming to apply our method to large legacy code 
bases like OpenSSL. As a step towards this goal we evaluated 
it on a range of protocol implementations, including recent 
code for smart electricity meters [37]. We were able to find 
bugs in preexisting implementations or to verify them with- 
out having to modify the code. Section 8 provides details. 

The current restriction of analysis to a single execution 
path may seem prohibitive at first sight. In fact, a great 
majority of protocols (such as those in the extensive SPORE 
repository [36]) follow a fixed narration of messages between 
participants, where any deviation from the expected message 
leads to termination. For such protocols, our method allows 
us to capture and analyse the fixed narration directly from 
the C code. In the future we plan to extend the analysis to 
more sophisticated control fiow. 

Related Work. We mention particularly relevant works here 
and provide a broader survey in section 9. One of the first 
attempts at cryptographic verification of C code is contained 
in [26] , where a C program is used to generate a set of Horn 
clauses that are then solved using a theorem prover. The 
method is implemented in the tool CSur. We improve upon 
CSur in two ways in particular. 

First, we have an explicit attacker model with a standard 
computational attacker. The attacker in CSur is essentially 
symbolic — it is allowed to apply cryptographic operations, 
but cannot perform any arithmetic computations. 

Second, we handle authentication properties in addition 
to secrecy properties. Adding authentication to CSur would 
be non-trivial, due to a rather coarse over-approximation of 
C code. For instance, the order of instructions in CSur is ig- 
nored, and writing a single byte into an array with unknown 
length is treated the same as overwriting the whole array. 
Authentication, however, crucially depends on the order of 
events in the execution trace as well as making sure that the 
authenticity of a whole message is preserved and not only of 
a single byte of it. 

ASPIER [18] uses model checking to verify implementa- 
tions of cryptographic protocols. The model checking oper- 
ates on a protocol description language, which is rather more 
abstract than C; for instance, it does not contain pointers 
and cannot express variable message lengths. The transla- 
tion from C to the protocol language is not described in the 
paper. Our method applies directly to C code with pointers, 
so that we expect it to provide much greater automation. 

Corin and Manzano [19] report an extension of the KLEE 
test-generation tool [17] that allows KLEE to be applied to 
cryptographic protocol implementations (but not to extract 
models, as in our work). They do not extend the class of 
properties that KLEE is able to test for; in particular, test- 
ing for trace properties is not yet supported. Similarly to 



our work, KLEE is based on symbolic execution; the main 
difference is that [19] treats every byte in a memory buffer 
separately and thus only supports buffers of fixed length. 

An appendix includes proofs for all the results stated in 
this paper. 

2. C VIRTUAL MACHINE (CVM) 

This section describes our low-level source language CVM 
(C Virtual Machine). The language is simple enough to 
formalise, while at the same time the operations of CVM 
are closely aligned with the operations performed by C pro- 
grams, so that it is easy to translate from C to CVM. We 
shall describe such a translation informally in section 3. 

The model of execution of CVM is a stack-based machine 
with random memory access. All operations with values 
are performed on the stack, and values can be loaded from 
memory and stored back to memory. The language con- 
tains primitive operations that are necessary for implement- 
ing security protocols: reading values from the network or 
the execution environment, choosing random values, writ- 
ing values to the network and signalling events. The only 
kind of conditional that CVM supports is a testing opera- 
tion that checks a boolean condition and aborts execution 
immediately if it is not satisfied. 

The fact that CVM permits no looping or recursion in the 
program allows us to inline all function calls, so that we do 
not need to add a call operation to the language itself. For 
simplicity of presentation we omit some aspects of the C 
language that are not essential for describing the approach, 
such as global variable initialisation and structures. We also 
restrict program variables to all be of the same size: for the 
rest of the paper we choose a fixed but arbitrary N G'N and 
assume sizeof (d) = N for all program variables v. Our 
implementation does not have these restrictions and deals 
with the full C language. 

Let BS = {0, 1}* be the set of finite bitstrings with the 
empty bitstring denoted by e. For a bitstring b let |&| be the 
length of b in bits. Let Var be a countably infinite set of 
variables. We write f : X ^ Y to denote a partial function 
and let dom(/) C X be the set of x for which f{x) is defined. 
We write f{x) = 1. when / is not defined on x and use the 
notation f{x h-> a} to update functions. 

Let Ops be a finite set of operation symbols such that 
each op £ Ops has an associated arity ar(op) and an ef- 
ficiently computable partial function Aop : BS^'^^°^' -^ BS. 
The set Ops is meant to contain both the primitive opera- 
tions of the language (such as the arithmetic or comparison 
operators of C) and the cryptographic primitives that are 
used by the implementation. The security definitions of this 
paper (given later) assume an arbitrary security parameter. 
Since real-life cryptographic protocols are typically designed 
and implemented for a fixed value of the security parame- 
ter, for the rest of the paper we let fco £ N be the security 
parameter with respect to which the operations in Ops are 
chosen. 

A CVM program is simply a sequence of instructions, 
as shown in fig. 3. To define the semantics of CVM we 
choose two functions that relate bitstrings to integer val- 
ues, val: BS -^ N and bs: N — >■ BS and require that for 
n < 2^ the value bs(n) is a bitstring of length A'^ such that 
val(bs(n)) = n. We allow bs to have arbitrary behaviour 
for larger numbers. The functions val and bs encapsulate 
architecture-specific details of integer representation such as 



b G BS, V e Var, op G Ops 

src ::= read | rnd input source 

dest ::= write | event output destination 

instr ::= instruction 

Const b constant value 

Ref II pointer to variable 

Malloc pointer to fresh memory 

Load load from memory 

In V src input 

Env V environment variable 

Apply op operation 

Out dest output 

Test test a condition 

Store 



P e CVM::= {instr;}* 



write to memory 
program 



Figure 3: The syntax of CVM. 

the endianness. Even though these functions capture an un- 
signed interpretation of bitstrings, we only use them when 
accessing memory cells and otherwise place no restriction 
on how the bitstrings are interpreted by the program oper- 
ations. For instance, the set Ops can contain both a signed 
and an unsigned arithmetic and comparison operators. Bit- 
string representations of integer constants shall be written 
as il, i20, etc, for instance, ilO = bs(lO). 

We let Addr = {1, . . . , 2^ — 1} be the set of valid memory 
addresses. The reason we exclude is to allow the length 
of the memory to be represented in N bits. The seman- 
tic configurations of CVM are of the form {A'^,M'^,S'^, P), 
where 

• Mf^ : Addr -^ {0,1} is a partial function that rep- 
resents concrete memory and is undefined for unini- 
tialised cells, 

• .A'^ C Addr is the set of allocated memory addresses, 

• .S'^ is a list of bitstrings representing the execution 
stack, 

• P £ CVM is the executing program. 

Semantic transitions are of the form (r;, s) — >■ (r;', s'), where 
s and s' are semantic configurations, 77 and rj' are environ- 
ments (mappings from variables to bitstrings) and / is a 
protocol action such as reading or writing values from the 
attacker or a random number generator, or raising events. 
The formal semantics of CVM is given in appendix C, in this 
section we give an informal overview. Before the program is 
executed, each referenced variable v is allocated an address 
addr(t;) in M'^ such that all allocations are non-overlapping. 
If the program contains too many variables to fit in mem- 
ory, the execution does not proceed. Next, the instructions 
in the program are executed one by one as described below. 
For a, 6 £ N we define {a}^^ = {a, . . . ,a + b — 1}. 

• Const b places b on the stack. 

• Ref V places bs(addr(u)) on the stack. 

• Malloc takes a value s from the stack, reads a value p 
from the attacker, and if the range {val(p)}^_^^/^^ does 
not contain allocated cells, it becomes allocated and 
the value p is placed on the stack. Thus the attacker 
gets to choose the beginning of the allocated memory 
area. 



• Load takes values I and p from the stack. In case 
{val(p)}^jj[(n is a completely initialised range in mem- 
ory, the contents of that range are placed on the stack. 
In case some of the bits are not initialised, the value 
for those bits is read from the attacker. 

• In n read or In v rnd takes a value I from the stack. 
In V read reads a value of length val(Z) from the at- 
tacker and In v rnd requests a random value of length 
val(/). The resulting value 6 is then placed on the 
stack. The environment rj is extended by the binding 
n i~> 6. 

• Env V places ri{v) and bs(|r;(-i;)|) on the stack. 

• Apply op with ar(op) — n applies Aop to n values on 
the stack, replacing them by the result. 

• Out write sends the top of the stack to the attacker 
and Out event raises an event with the top of the stack 
as payload. Events with multiple arguments can be 
represented using a suitable bitstring pairing opera- 
tion. Both commands remove the top of the stack. 

• Test takes the top of the stack and checks whether it 
is il. If yes, the execution proceeds, otherwise it stops. 

• Store takes values p and b from the stack and writes 
6 into memory at position starting with val(p). 

The execution of a program can get stuck if rule con- 
ditions are violated, for instance, when the program runs 
out of memory or attempts to write to uninitialised mem- 
ory. All these situations would likely result in a crash in a 
real system. Our work is not focused on preventing crashes, 
but rather on analysing the sequences of events that occur 
before the program terminates (either normally or abnor- 
mally). Thus we leave crashes implicit in the semantics. 
An exception is the instruction Load: reading uninitialised 
memory is unlikely to result in a crash in reality, instead 
it can silently return any value. We model this behaviour 
explicitly in the semantics. 

3. FROM C TO CVM 

We describe how to translate from C to CVM programs. 
We start with aspects of the translation that are particular 
to our approach, after which we illustrate the translation by 
applying it to the example program in fig. 1. 

Proving correctness of C compilation is not the main fo- 
cus of our work, so we trust compilation for now. To prove 
correctness formally one would need to show that a CVM 
translation simulates the original C program; an appropri- 
ate notion of simulation is defined in appendix B and is used 
to prove soundness of other verification steps. We believe 
that work on proving correctness of the CompCert compiler 
[31] can be reused in this context. 

We require that the C program contains no form of looping 
or function call cycles and that all actions of the program 
(either network outputs or events) happen in the same path 
(called main path in the following). We then prune all other 
paths by replacing if-statements on the main path by test 
statements: a statement if(cond) t_block else f_block 
is replaced by test(cond) ; t_block in case the main path 
continues in the t_block, and by test(!cond); f_block 
otherwise. The test statements are then compiled to CVM 
Test instructions. The main path can be easily identified 
by static analysis; for now we simply identify the path to be 
compiled by observing an execution of the program. 

As mentioned in the introduction, we do not verify the 



void mac_proxy { void * buf , size. 


.t 


buflen , 


void * key , size. 


.t 


keylen , 


void * mac){ 






load_buf ( buf , buflen ) ; 






load_buf (key , keylen ) ; 






apply ( "mac" , 2 ) ; 






store_buf (mac ) ; 
} 






int memcmp_proxy ( void * a, void 


* 


b, 


size_t len ){ 







int ret ; 

load_buf(a, len); 
load_buf(b, len); 
apply ("cmp" , 2); 
store_buf (&ret ) ; 
return ret ; 

} 

Figure 4: Examples of proxy functions. 

source code of cryptographic functions, but instead trust 
tliat tiiey implement tiie cryptographic algorithms correctly. 
Similarly, we would not be able to translate the source code 
of functions like memcmp into CVM directly, as these func- 
tions contain loops. Thus for the purpose of CVM transla- 
tion we provide an abstraction for these functions. We do so 
by writing what we call a proxy function f _proxy for each 
function f that needs to be abstracted. Whenever a call to 
f is encountered during the translation, it is replaced by the 
call to f _proxy. The proxy functions form the trusted base 
of the verification. 

Examples of proxy functions are shown in fig. 4. The func- 
tions load_buf , apply and store_buf are treated specially 
by the translation. For instance, assuming an architecture 
with A'^ = 32, a call load_buf (buf , len) directly generates 
the sequence of instructions: 



Ref buf; 
Ref len ; 



Const 132 
Const 132 



Load ; 
Load ; 



Load ; 



Similarly we provide proxies for all other special functions 
in the example program, such as readenv, read, write or 
event. The proxies essentially list the CVM instructions 
that need to be generated. 

Appendix H.2 shows more examples of proxy functions. 
Appendix A shows the CVM translation of our example C 
program in fig. 1. 

4. INTERMEDIATE MODEL LANGUAGE 

This section presents the intermediate model language 
(IML) that we use both to express the models extracted 
from CVM programs and to describe the environment in 
which the protocol participants execute. IML borrows most 
of its structure from the pi calculus [1, 14]. In addition it 
has access both to the set Ops of operations used by CVM 
programs and to primitive operations on bitstrings: con- 
catenation, substring extraction, and computing lengths of 
bitstrings. Unlike CVM, IML does not access memory or 
perform destructive updates. 

The syntax of IML is presented in fig. 5. In contrast to 
the standard pi calculus we do not include channel names, 
but implicitly use a single public channel instead. This cor- 
responds to our security model in which all communication 
happens through the attacker. The nonce sampling opera- 
tion (i/a:;[e]) takes an expression as a parameter that specifies 
the length of the nonce to be sampled — this is necessary in 



h G BS, X g Var, op e Ops 
e £ I Exp ::= expression 

b concrete bitstring 

X variable 

op(ei , . . . , en) computation 

ei|e2 concatenation 

e{eo,ei} substring extraction 

len(e) length 
P, Q £ IML ::= process 

nil 

!P replication 

P\Q parallel composition 

(I'xfe]); P randomness 

in(x); P input 

out(e); P output 

event (e); P event 

if e then P [else Q] conditional 
let a:: = e in P [else Q] evaluation 

Figure 5: The syntax of IML. 

[[6] = b, for b g BS, 

[[x] = ±, for X g Var, 

[op(ei,...,e„)] = ylop(leil,...,le„]), 

[ei|e2]=[eil|[e2l, 

Ie{eo,e,}l = sub([e], val([eo]), val(Ie,])), 

pen(e)I=bs(|[e]|). 

Figure 6: The evaluation of IML expressions, 
whereby _L propagates. 

the computational setting in order to obtain a probability 
distribution. We introduce a special abbreviation for pro- 
grams that choose randomness of length equal to the secu- 
rity parameter feo introduced in section 2: let (vx); P stand 
for (i/2;[fco]); let x — nonceix) in P, where nonce G Ops. 
Using nonce allows us to have tagged nonces, which will be 
necessary to link to the pi calculus semantics from [4]. 

For a bitstring 6 let b[i\ be the ith bit of h counting from 
0. The concatenation of two bitstrings hi and 62 is written 
as 6i|62. 

Just as for CVM, the semantics of IML is parameterised 
by functions bs and val. The semantics of expressions is 
given by the partial function \-\ : lExp — ^ BS described in 
fig. 6. The partial function sub: BS x N x N ^^ BS extracts 
a substring of a given bitstring such that sub(6, o, I) is the 
substring of b starting at offset o of length I: 



sub(6, o, I) 



h[6\...h[o + l- 1] if o-hZ < |fe| 
1. otherwise. 



For a valuation rj: Var -^ BS we denote with |[e|,, the result 
of substituting all variables « in e by ri(v) (if defined) and 
then applying |-]. 

The formal semantics of IML is mostly straightforward 
and is shown in detail in appendix D. 



5. SECURITY OF PROTOCOLS 

This section gives an informal overview of our security 
definition. Tlie complete definition is given in appendix B. 

To define security for protocols implemented by CVM and 
IML programs we need to specify what a protocol is and 
give a mapping from programs to protocols. The notion 
of a protocol is formally captured by a protocol transition 
system (PTS), which describes how processes evolve and in- 
teract with the attacker. A PTS is a set of transitions of 
the form {rj, s) -^ {(?7i, si), . . . , {rjn, s„)}, where ri and rfi are 
environments (modelled as valuations), s and Si are seman- 
tic configurations of the underlying programming language, 
and I is an action label. Actions can include reading values 
from the attacker, generating random values, sending values 
to the attacker, or raising events. We call a pair {j], s) an 
executing process. Multiple processes on the right hand side 
capture replication. 

The semantics of CVM and IML are given in terms of the 
PTS that are implemented by programs. For a CVM pro- 
gram P we denote with \P\c the PTS that is implemented 
by P. Similarly, for an IML process P the corresponding 
PTS is denoted by [PI/. 

Given a PTS T and a probabilistic machine E (an at- 
tacker) we can execute T in the presence of E. The state of 
the executing protocol is essentially a multiset of executing 
processes. The attacker repeatedly chooses a process from 
the multiset which is then allowed to perform an action ac- 
cording to T. The result of the execution is a sequence of 
raised events. For a resource bound i £ N we denote with 
Events(r, E, t) the sequence of events raised during the first 
t steps of the execution. We shall be interested in the proba- 
bility that this sequence of events belongs to a certain "safe" 
set. This is formally captured by the following definition: 

Definition 1 (Protocol security) We define a trace prop- 
erty as a polynomially decidable prefix-closed set of event 
sequences. For a PTS T, a trace property p and a resource 
bound t £ N let insec(T, p, t) be the probability 

sup{Pr[Events(r, £,i) ^ p\\ E attacker, \E\ < t} , 

where |-E| measures the size of the description of the at- 
tacker, n 

Intuitively insec(T, p, t) measures the success probability 
of the most successful attack against T and property p when 
both the execution time of the attack and the size of the 
attacker code are bounded by t. 

Since the semantics of CVM and IML are in the same for- 
malism, we may combine the sets of semantic rules and ob- 
tain semantics \-\ci for mixed programs, where a CVM pro- 
gram can be a subprocess of a larger IML process. We add 
an additional syntactic form \\i (a hole) with i G N and no 
reductions to IML. For an IML process Pe with n holes and 
CVM or IML processes Pi, . . . , P„ we write Pb[Pi, . . . , P„] 
to denote process Pe where each hole []i is replaced by Pi. 
The semantics of the resulting process P'e, denoted with 
I^bIc"/, is defined in appendix D. 

Being able to embed a CVM program within an IML pro- 
cess is useful for modelling. As an example, let Pi be the 
CVM program resulting from the translation of the C code 
in fig. I and let P2 be a description of another participant 
of the protocol, in either CVM or IML. Then we might be 
interested in the security of the following process: 

PE[Pl,P2] = Wk)- ((!Pi)|(!P2))). 



V G Var, i G N 

ph G PBase ::= pointer base 

stack V stack pointer to variable v 

heap i heap pointer with id i 

e G SExp ::= symbolic expression 
ptr(pb, e) pointer 

same as lExp in fig. 5 

Figure 7: Symbolic expressions. 

A trace property p of interest might be, for instance, "Each 
event of the form accept{x) is preceded by an event of the 
form request{x)" , where request is an event possibly raised 
in P2. The goal is to obtain a statement about probability 
insec(|PB[Pi, P2]|c/,p, i) for various t. The next section 
shows how we can relate the security of Pe[Pi,P2] to the 
security of Pb[Pi,P2], where IML process Pi is a model of 
the CVM process Pi, extracted by symbolic execution. 

6. CVM TO IML: SYMBOLIC EXECUTION 

We describe how to automatically extract an IML model 
from a CVM program while preserving security properties. 
The key idea is to execute a CVM program in a symbolic 
semantics, where, instead of concrete bitstrings, memory lo- 
cations contain IML expressions representing the set of all 
possible concrete values at a given execution point. 

To track the values used as pointers during CVM exe- 
cution, we extend IML expressions with an additional con- 
struct, resulting in the class of symbolic expressions shown 
in fig. 7. An expression of the form ptr(p6, Co) represents a 
pointer into the memory location identified by the pointer 
base pb with an offset Co relative to the beginning of the 
location. We require that Co £ lExp, so that pointer offsets 
do not contain pointers themselves. Pointer bases are of two 
kinds: a base of the form stack v represents a pointer to the 
program variable v and a base of the form heap i represents 
the result of a Malloc. 

Symbolic execution makes certain assumptions about the 
arithmetic operations that are available in Ops. We as- 
sume that programs use operators for bitwise addition and 
subtraction (with overflow) that we shall write as -|-i, and 
— b- We also make use of addition and subtraction without 
overflow — the addition operator (written as +n) is expected 
to widen its result as necessary and the negation operator 
(written as — n) returns ± instead of a negative result. We 
assume that Ops contains comparison operators =, <, and 

< such that A^{a,b) returns il if val(o) = val(6) and iO oth- 
erwise, similarly for the other operators. This way < and 

< capture unsigned comparisons on bitstring values. We 
assume Ops contains logical connectives -1 and V that in- 
terpret iO as false value and il as true value. These operators 
may or may not be the ones used by the program itself. 

To evaluate symbolic expressions concretely, we need con- 
crete values for pointer bases as well as concrete values for 
variables. Given an extended valuation rj : Var U PBase -^ 
BS, we extend the function |-]r, from flg. 6 by the rule: 

|[ptr(p6,eo)I,, = ri{pb) +t Icajr,. 

When applying arithmetic operations to pointers, we need 
to make sure that the operation is applied to the pointer 
offset and the base is kept intact. This behaviour is encoded 



(Init, P) -^ (Sop, {stack v i-^ hs{N) | v G var(P)} , {stack v <->■ e \ v G var(P)} , [], P) 



(S, A", ^^^ 5=, Const b; P) -^ (S, A^ , M" , b :: S" , P) 

(S, A\ M\ S^ Ref v; P) -s> (S, y^^ X^ ptr(stack v, iO) :: cS^ P) 

e; £ /iJxp i £ N minimal s.t. pb = heap i ^ dom(A^*) 
(S, ^'', M", e; :: 5^ Malloc; P) -> (S, yl={pfeH-> ej, M={pb^ e}, ptr{pb,iO) :: S", P) 

pb £ dom(X^) e = simphfy^jM" {pb){eo, ei}) S h (ep +k e; < getLen(>i''(pfe))) 
(S, A", M", ei :: ptr{pb,eo) :: 5=, Load; P) -5- (S, yl^ A^ = , e :: 5", P) 

e; € /Exp i = (if src = read then in(i)); else (!^i)[e;]);) 

(S, A", M", ei :: 5^ In v src; P) -^• (E U {len(i)) = ej, ^'', M", ■!; :: 5^ P) 

(E, ^^ M", <S^ Env 1); P) -5> (E, yl^ X^ len(i)) ;; v :: 5^ P) 

e = apply(op, ei, . .. ,e„) ^ ± 
(S, A'', M", ei :: . . . :: e„ :: S", Apply op; P) -> (E, A^ , M" , len(e) :: e :: 5^ P) 

e 6 /Sxp / = (if dest = write then out(e); else event(e);) 

(S, A", M", e :: S", Out dest; P) -^ (E, ^s, X^ 5^ P) 

e e /Exp 

(E, A^, M", e :: 5^ Test; P) '^ "^ *'^''"> (S U {e}, A", M", S", P) 

eh = M"{pb)j^± es = A"(pb)jt± e,h = getLen(eh) e; = getLen(e) 
either S h (eo +n e; < ejh) and e'^ = simplify5.(e^{iO, eo}|e|eh{eo +n 6^,6;^ -n (eo +n ei)}) 
or S h (eo +N e; > Cj^) A (eo < 6;^) A (eo +n e; < e^) and e'^ = simplifyj^(e^{JO, eo}|e) 



(S-Init) 

(S-Const) 

(S-Ref) 

(S-Malloc) 

(S-Load) 

(S-In) 
(S-Env) 

(S-Apply) 

(S-Out) 

(S-Test) 



(E, A", M^, ptr(pfe, eo) :: e :: 5^ Store; P) ^ (E, y4^ X" 



■4}.'5^P) 



(S-Store) 



Figure 8: The symbolic execution of CVM. 



by the function apply, defined as follows: 

apply(+6,ptr(p6, eo),e) = ptr(p6, eo +b e), 
for e £ I Exp, 

apply(-6,ptr(p6, eo),ptr(p6, Bo)) = Co ~b e'^, 
apply(op,ei,...,e„) = op(ei, . . . , e„), 

for ei , . . . , e„ G lExp, 
apply(...) = ±, otherwise. 



As well as tracking the expressions stored in memory, we 
also track logical facts discovered during symbolic execu- 
tion. To record these facts, we use symbolic expressions 
themselves, interpreted as logical formulas with =, <, and 
< as relations and ^ and V as connectives. We allow quanti- 
fiers in formulas, with straightforward interpretation. Given 
a set S of formulas and a formula we write E h iflt for 
each E-consistent valuation rj (that is, a valuation such that 
hPiv ~ ^1 for *11 '(/' £ E) we also have |(^|,, = il. 

To check the entailment relation, our implementation re- 
lies on the SMT solver Yices [21], by replacing unsupported 
operations, such as string concatenation or substring extrac- 
tion, with uninterpreted functions. This works well for our 
purpose — the conditions that we need to check during the 
symbolic execution are purely arithmetic and are supported 
by Yices' theory. 

The function getLen returns for each symbolic expression 



an expression representing its length: 

getLen(ptr(. . .)) = bs(A''), 
getLen(len(. . .)) = bs(iV), 
getLen(&) = bs(|6|), for b G BS, 
getLen(a;) = len(a:;), for x £ Var, 
getLen(op(ei, . . . , e„)) = len(op(ei,. . . ,e„)), 
getLen(ei|e2) = getLen(ei) -|-n getLen(e2), 
getLen(e{eo, e;}) = ej. 

We assume that the knowledge about the return lengths of 
operation applications is encoded in a fact set Eop. As an 
example, Eop might contain the facts: 

Va;, y, a : len(a;) = a A len(y) — a - 
\/x: len{shal(x)) = i20. 



len(a:: +b y) = a, 



We assume that Eop is consistent: 



h for all (A £ Eo 



The transformations prescribed by the symbolic semantic 
rules would quickly lead to very large expressions. Thus 
the symbolic execution is parametrised by a simplification 
function simplify that is allowed to make use of the collected 
fact set E. We demand that the simplification function is 
sound in the following sense: for each fact set E, expression 
e and a E-consistent valuation rj we have 

H^ / ± ^ [simplify j,(e)l^ = H^. 

The simplifications employed in our algorithm are described 
in appendix E. 



Line no. C line 



symbolic memory updates new facts 



generated IML line 



2. 
3. 

4. 

5. 
6. 

7. 
8. 
9. 



readenvC'k", &key, &keylen) ; 



read(&len, sizeof (len)) ; 

ifClen > 1000) exitO; 

void * buf = mallocden + 2 * MAC_LEN) ; 

read(buf , len) ; 

mac (buf, len, key, keylen, buf + len); 

readCbuf + len + MAC_LEN, MC_LEN) ; 

if (memcmp ( . . . ) == 0) 

event ("accept" , buf, len); 



stack key => ptr(heap 1, JO) 
heap 1 => A; 
stack keylen => len(A;) 
stack len =^ I 

stack buf =► ptr(heap 2, iO) 

heap 2 =► £ 

heap 2 =► a;i 

heap 2 =► xi\mac{k, xi) 

heap 2 =► xi\mac{k, xi)|a;2 



len(/) = iN 
-.{« > ilOOO) 



len(a;i) = I 
len(a;2) = i20 



in(/) 



in(xi) 



in(a;2) 

if mac(k, xi) = X2 then 

event accept{xi) 



Figure 9: Symbolic execution of the example in fig. 1. 



The algorithm for symbolic execution is determined by the 
set of semantic rules presented in fig. 8. The initial seman- 
tic configuration has the form (Init, P) with the executing 
program P £ CVM. The other semantic configurations have 
the form (E, A', M" , <S^ P), where 

• EC SExp is a set of formulas (the path condition) , 

• .4" : PBase -^ SExp is the symbolic allocation table 
that for each memory location stores its allocated size, 

• M" : PBase -^ SExp is the symbolic memory. We re- 
quire that Aom{M") = dom(y4''), 

• 5° is a list of symbolic expressions representing the 
execution stack, 

• P £ CVM is the executing program. 

The symbolic execution rules essentially mimic the rules 
of the concrete execution. The crucial rules are (S-Load) 
and (S-Store) that reflect the elfect of storing and loading 
memory values on the symbolic level. The rule (S-Load) is 
quite simple — it tries to deduce from E that the extraction is 
performed from a defined memory range, after which it rep- 
resents the result of the extraction using an IML range ex- 
pression. The rule (S-Store) distinguishes between two cases 
depending on how the expression e to be stored is aligned 
with the expression eu that is already present in memory. If 
e needs to be stored completely within the bounds of Ch 
then we replace the contents of the memory location by 
eh{- ■ .}\e\eh{- ■ ■} where the first and the second range ex- 
pression represent the pieces of en that are not covered by 
e. In case e needs to be stored past the end of eh, the 
new expression is of the form eh{- ■ .}|e. The rule still re- 
quires that the beginning of e is positioned before the end 
of Ch, and hence it is impossible to write in the middle of 
an uninitialised memory location. This is for simplicity of 
presentation — the rule used in our implementation does not 
have this limitation (it creates an explicit "undefined" ex- 
pression in these cases). 

Since all semantic rules are deterministic there is only one 
symbolic execution trace. Some semantic transition rules are 
labelled with parts of IML syntax. The sequence of these 
labels produces an IML process that simulates the behaviour 
of the original CVM program. Formally, for a CVM program 
P, let L be the symbolic execution trace starting from the 
state (Init, P). If L ends in a state with an empty program, 
let Ai, . . . , An be the sequence of labels of L and set |[P|s = 
Ai . . . A„0 G IML, otherwise set |P|s = _L. 

We shall say that a polynomial is fixed iff it is independent 
of the arbitrary values assumed in this paper, such as A'^ or 
the properties of the set Ops. Our main result relates the 
security of P to the security of [PJs. 



Theorem 1 (Symbolic Execution is Sound) There ex- 
ists a fixed polynomial p _such that if P\,. . . ,Pn are CVM 
processes and for each i Pi := \Pi\s 7^ -L then for any IML 
process Pe, any trace property p, and resource hound i £ N; 



insec([PE[Pi,...,P„ 
<insec(IPB[Pi, 



]ci,P,t) 
..,P^]li,p,p{t)). 



The condition that p is fixed is important — otherwise p 
could be large enough to give the attacker the time to enu- 
merate all the 2 memory configurations. For practical 

use the actual shape of p can be recovered from the proof of 
the theorem given in appendix F. 

Fig. 9 illustrates our method by showing how the symbolic 
execution proceeds for our example in fig. 1. For each line 
of the C program we show updates to the symbolic memory, 
the set of new facts, and the generated IML code if any. In 
our example MAC_LEN is assumed to be 20 and N is equal 
to sizeof (size_t). The variables /, x\, and X2 are arbi- 
trary fresh variables chosen during the translation from C 
to CVM (see appendix A). Below we mention details for 
some particularly interesting steps (numbers correspond to 
line numbers in fig. 9). 

1. The call to readenv redirects to a proxy function that 
generates CVM instructions for retrieving the environ- 
ment variable k and storing it in memory. 

4. A new empty memory location is created and the pointer 
to it is stored in buf. We make an entry in the allo- 
cation table A" with the length of the new memory 
location {I -\-t, i2 * i20). 

5. We check that the stored value fits within the allocated 
memory area, that is, I < l+bi2*i20. This is in general 
not true due to possibility of integer overflow, but in 
this case succeeds due to the condition -^{l > ilOOO) 
recorded before (assuming that the maximum integer 
value 2^ — 1 is much larger than 1000) . Similar checks 
are performed for all subsequent writes to memory. 

7. The memory update is performed through an interme- 
diate pointer value of the form ptr(heap 2, I +1, i20). 
The set of collected facts is enough to deduce that this 
pointer points exactly at the end of Xi\mac{k,xi). 

8. The proxy function for memcmp extracts values ei = 
e{l, i20} and 62 = e{l +b i20, i20}, where e is the con- 
tents of memory at heap 2, and puts cmp(ei,e2) on 
the stack. With the facts collected so far ei simplifies 
to mac{k,xi) and e2 simplifies to X2- With some spe- 
cial comprehension for the meaning of cmp we generate 
IML if ei = 62 then. 



7. VERIFICATION OF IML 

The symbolic model extracted in fig. 9 does not con- 
tain any bitstring operations, so it can readily be given to 
ProVerif for verification. In general this is not the case and 
some further simplifications are required. In a nutshell, the 
simplifications are based on the observation that the bit- 
string expressions (concatenation and substring extraction) 
are meant to represent pairing and projection operations, so 
we can replace them by new symbolic operations that behave 
as pairing constructs in ProVerif. We then check that the 
expressions indeed satisfy the algebraic properties expected 
of such operations. 

We outline the main results regarding the translation to 
ProVerif. Appendix G contains the details. The pi calculus 
used by ProVerif can be described as a subset of IML from 
which the bitstring operations have been removed. Unlike 
CVM and IML, the semantics of pi is given with respect 
to an arbitrary security parameter: we write |-P]J for the 
semantics of a pi process P with respect to the parameter 
k £ N. In contrast, we consider IML as executing with re- 
spect to a fixed security parameter fco G N. For an IML 
process P we specify conditions under which it is translat- 
able to a pi process P. 

Theorem 2 (Soundness of the translation) 

There exists a fixed polynomial p such that for any P £ IML 
translatable to a pi process P, any trace property p and re- 
source boundt G N; insec(|P]]/,p, i) < insec(|[P|^°, p,p(i)).n 

Backes et al. [4] provide an example of a set of operations 
Ops'^ and a set of soundness conditions restricting their 
implementations that are sufficient for establishing compu- 
tational soundness. The set Ops contains a public key 
encryption operation that is required to be IND-CCA se- 
cure. The soundness result is established for the class of the 
so-called key-safe processes that always use fresh random- 
ness for encryption and key generation, only use honestly 
generated decryption keys and never send decryption keys 
around. 

Theorem 3 (Computational soundness) Let P be a pi 

process using only operations in Ops such that the sound- 
ness conditions are satisfied. If P is key-safe and symboli- 
cally secure with respect to a trace property p (as checked by 
Pro Verif) then for every polynomial p the following function 
is negligible in k: insec(|P]]*, p, p{k)). n 

Overall, theorems 1 to 3 can be interpreted as follows: let 
Pi, . . . ,P„ be implementations of protocol participants in 
CVM and let Pe be an IML process that describes an execu- 
tion environment. Assume that Pi, . . . , P„ are successfully 
symbolically executed with resulting models Pi, . . . , Pn, the 
IML process Pe [Pi , ■ ■ ■ , Pn] is successfully translated to a 
pi process Pn , and ProVerif successfully verifies P,r against 
a trace property p. Then we know by theorem 3 that P^ 
is a pi protocol model that is (asymptotically) secure with 
respect to p. By theorems 1 and 2 we know that Pi , . . . , P„ 
form a secure implementation of the protocol described by 
Pn- for the security parameter fco. 

8. IMPLEMENTATION & EXPERIMENTS 

We have implemented our approach and successfully tested 
it on several examples. Our implementation performs the 
conversion from C to CVM at runtime — the C program is 
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Figure 10: Summary of analysed implementations. 

read(conn_fd , temp, 128); 

// BN_hex2bn expects zero — terminated string 

temp[128] = 0; 

BN_hex2bn(&cipher_2 , temp); 

// decrypt and parse cipher_2 

// to obtain message fields 

Figure 11: A flaw in the CSur example: input may 
be too short. 



instrumented using CIL so that it outputs its own CVM rep- 
resentation when run. This allows us to identify and compile 
the main path of the protocol easily. Apart from information 
about the path taken we do not use any runtime informa- 
tion and we plan to make the analysis fully static in future. 
The idea of instrumenting a program to emit a low-level set 
of instructions for symbolic execution at runtime as well as 
some initial implementation code were borrowed from the 
CREST symbolic execution tool [16]. 

Currently we omit certain memory safety checks and as- 
sume that there are no integer overfiows. This allows us 
to use the more efficient theory of mathematical integers in 
Yices, but we are planning to move to exact bitvector treat- 
ment in future. 

The implementation comprises about 4600 lines of OCaml 
code. The symbolic proxies for over 80 of the cryptographic 
functions in the OpenSSL library comprise further 2000 lines 
of C code. 

Fig. 10 shows a list of protocol implementations on which 
we tested our method. Some of the verified programs did not 
satisfy the conditions of computational soundness (mostly 
because they use cryptographic primitives other than public 
key encryption and signatures supported by the result that 
we rely on [4]), so we list the verification type as "symbolic". 

The "simple mac" is an implementation of a protocol sim- 
ilar to the example in fig. 1. RPC is an implementation of 
the remote procedure call protocol in [8] that authenticates 
a server response to a client using a message authentication 
code. It was written by a colleague without being intended 
for verification using our method, but we were still able to 
verify it without any further modifications to the code. 

The NSL example is an implementation of the Needham- 
Schroeder-Lowe protocol written by us to obtain a fully com- 
putationally sound verification result. The implementa- 
tion is designed to satisfy the soundness conditions listed 
in appendix G (modulo the assumption that the encryption 
used is indeed IND-CCA). Masking the second participant's 
identity check triggers Lowe's attack [32] as expected. Ap- 
pendix H shows the source code and the extracted models. 

The CSur example is the code analysed in a predecessor 
paper on C verification [26]. It is an implementation of a 
protocol similar to Needham-Schroeder-Lowe. During our 
verification attempt we discovered a flaw, shown in fig. 11: 
the received message in buffer temp is being converted to 
a BIGNUM structure cipher_2 without checking that enough 
bytes were received. Later a BIGNUM structure derived from 



unsigned char session_key [256 / 8]; 

// Use the 4 first bytes as a pad 
// to encrypt the reading 
encrypted_reading = 

((unsigned int ) * session_key ) " *rcading; 

Figure 12: A flaw in the minexplib code: only one 
byte of the pad is used. 



cipher_2 is converted to a bitstring without checking that 
the length of the bitstring is sufficient to fiU the message 
buffer. In both cases the code does not make sure that the 
information in memory actually comes from the network, 
which makes it impossible to prove authentication proper- 
ties. The CSur example has been verified in [26], but only for 
secrecy, and secrecy is not affected by the flaw we discovered. 
The code reinterprets network messages as C structures (an 
unsafe practise due to architecture dependence), which is 
not yet supported by our analysis and so we were not able 
to verify a fixed version of it. 

The minexplib example is an implementation of a privacy- 
friendly protocol for smart electricity meters [37] developed 
at Microsoft Research. The model that we obtained uncov- 
ered a flaw shown in fig. 12: incorrect use of pointer derefer- 
encing results in three bytes of each four-byte reading being 
sent unencrypted. We found two further flaws: one could 
lead to contents of uninitialised memory being sent on the 
network, the other resulted in being sent (and accepted) 
in place of the actual number of readings. All flaws have 
been acknowledged and flxed. An F# implementation of 
the protocol has been previously verifled [38], which high- 
lights the fact that C implementations can be tricky and can 
easily introduce new bugs, even for correctly specified and 
proven protocols. The protocol uses low-level cryptographic 
operations such as XOR and modular exponentiation. In 
general it is impossible to model XOR symbolically [40], so 
we could not use ProVerif to verify the protocol, but we are 
investigating the use of CryptoVerif for this purpose. 

9. RELATED WORK 

[26] presents the tool Csur for verifying C implementations 
of crypto-protocols by transforming them into a decidable 
subset of first-order logic. It only supports secrecy properties 
and relies on a Dolev-Yao attacker model. It was applied to a 
self-made implementation of the Needham-Schroeder proto- 
col. [18] presents the verification framework ASPIER using 
predicate abstraction and mo del- checking which operates on 
a protocol description language where certain C concepts 
such as pointers and variable message lengths are manually 
abstracted away. In comparison, our method applies directly 
to C code including pointers and thus requires less manual 
effort. [28] presents the C API "DYC" which can be used to 
generate executable protocol implementations of Dolev-Yao 
type cryptographic protocol messages. By generating con- 
straints from those messages, one can use a constraint solver 
to search for attacks. The approach presents significant limi- 
tations on the C code. [39] reports on the Pistachio approach 
which verifies the conformance of an implementation with a 
specification of the communication protocol. It does not 
directly support the verification of security properties. To 
prepare the ground for symbolic analysis of cryptographic 
protocol implementations, [19] reports an extension of the 



KLEE symbolic execution tool. Cryptographic primitives 
can be treated as symbolic functions whose execution anal- 
ysis is avoided. A security analysis is not yet supported. The 
main difference from our work is that [19] treats every byte 
in a memory buffer separately and thus only supports buffers 
of fixed length. [20] shows how to adapt a general-purpose 
verifier to security verification of C code. This approach does 
not have our restriction to non-branching code, on the other 
hand, it requires the code to be annotated (with about one 
line of annotation per line of code) and works in the sym- 
bolic model, requiring the pairing and projection operations 
to be properly encapsulated. 

There is also work on verifying implementations of secu- 
rity protocols in other high-level languages. These do not 
compare directly to the work presented here, since our aim 
is in particular to be able to deal with the intricacies of a 
low-level language like C. The tools FS2PV [10] and FS2CV 
translate F# to the process calculi which can be verified by 
the tools ProVerif [11] and CryptoVerif [12] versus symbolic 
and computational models, respectively. They have been ap- 
plied to an implementation of TLS [9]. The refinement-type 
checker F7 [8] verifies security properties of F# programs 
versus a Dolev-Yao attacker. Under certain conditions, this 
has been shown to be provably computationally sound [6, 
23]. [33] reports on a formal verification of a reference imple- 
mentation of the TPM's authorization and encrypted trans- 
port session protocols in F#. It also provides a translator 
from programs into the functional fragment of F# into exe- 
cutable C code. [6] gives results on computational soundness 
of symbolic analysis of programs in the concurrent lambda 
calculus RCF. [5] reports on a type system for verifying 
crypto-protocol implementations in RCF. With respect to 
Java, [29] presents an approach which provides a Dolev-Yao 
formalization in FOL starting from the program's control- 
flow graph, which can then be verified for security properties 
with automated theorem provers for FOL (such as SPASS). 
[35] provides an approach for translating Java implemen- 
tations into formal models in the LySa process calculus in 
order to perform a security verification. [27] presents an 
application of the ESC/Java2 static verifier to check confor- 
mance of JavaCard applications to protocol models. [22] de- 
scribes verification of cryptographic primitives implemented 
in a functional language Cryptol. CertiCrypt [7] is a frame- 
work for writing machine-checked cryptographic proofs. 

10. CONCLUSION 

We presented methods and tools for the automated veri- 
fication of cryptographic security properties of protocol im- 
plementations in C. More specifically, we provided a com- 
putationally sound verification of weak secrecy and authen- 
tication for (single execution paths of) C code. Despite the 
limitation of analysing single execution paths, the method 
often suffices to prove security of authentication protocols, 
many of which are non-branching. We plan to extend the 
analysis to more sophisticated control flow. 

In future, we aim to provide better feedback in case ver- 
ification fails. In our case this is rather easy to do as sym- 
bolic execution proceeds line by line. If a condition check 
fails for a certain symbolic expression, it is straightforward 
to print out a computation tree for the expression together 
with source code locations in which every node of the tree 
was computed. We plan to implement this feature in the 
future, although so far we found that manual inspection of 
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the symbolic execution trace lets us identify problems easily. 
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APPENDIX 

A. C TO CVM— EXAMPLE 

Fig. 13 shows the GVM translation of the example pro- 
gram from fig. 1. We use abbreviations for some useful in- 
struction sequences: we write Clear as an abbreviation for 
Store dummy that stores a value into an otherwise unused 
dummy variable. The effect of Clear is thus to remove one 
value from the stack. Often we do not need the length of the 
result that the instructions Env and Apply place on the stack, 
so we introduce the versions Env' and Apply' that discard 
the length: Env ' d is an abbreviation for Env v ; Cleeir and 
Apply' i; is an abbreviation for Apply v; Clear. The ab- 



breviation Varsize is supposed to load the variable width A'^ 
onto the stack, for instance, on an architecture with N = 2,2 
the meaning of Varsize would be Const 132. For conve- 
nience we write operation arguments of Apply together with 
their arities. 

During the translation we arbitrarily choose fresh vari- 
ables I, xi, and X2 for use in the In operations. 

// void * key ; size_t keylen ; 

//readenv ("k " , &key , &keylen); 

Env k; Ref keylen; Store; 

Ref keylen; Varsize; Load; Malice; 

Ref key ; Store ; 

Ref key; Varsize; Load; Store; 

// s ize_t len ; 

//read(&len, size of ( len ) ) ; 

Varsize ; In 1 read ; Ref len ; Store ; 

// if (len > 1000) exit(); 

Const ilOOO ; Ref len; Varsize; Load; 

Apply' >/2; Apply' -/I; Test; 

//void * buf = malloc(len + 2 * 20); 

Ref len; Varsize; Load; 

Const i2 ; Const 120 ; 

Apply' */2; Apply' +/2; 

Malloe; Ref buf; Store ; 

//read ( buf , len); 

Ref len; Varsize; Load; In xl 

Ref buf; Varsize; Load; Store: 



//mac (buf, len, key, keylen 



read ; 

buf + len ) ; 



Ref buf; Varsize; Load 

Ref len; Varsize; Load; Load; 

Ref key; Varsize; Load; 

Ref keylen; Varsize; Load; Load; 

Apply ' mac / 2 ; 

Ref buf; Varsize; Load; 

Ref len; Varsize; Load; 

Apply' +/2; Store; 

//read(buf + len + 20, 20); 

Const i20 ; In x2 read ; 

Ref buf; Varsize; Load; 

Ref len; Varsize; Load; 

Const i20; 

Apply' +/2; Apply' +/2; Store; 

// if (memcm,p( buf + len, 

// buf + len + 20, 

// 20) == 0) 

Ref buf; Varsize; Load; 

Ref len ; Varsize ; 

Const i20 ; Load ; 

Ref buf; Varsize; 

Ref len ; Varsize ; 

Const i20; Apply' +/2; Apply' -f/2; 

Const i20 ; Load ; 

Apply ' cnip / 2 ; 

Const 0; Apply' ==/2; Test; 

// event (" accept " , buf , len ) ; 

Ref buf; Varsize; Load; 

Ref len; Varsize; Load; Load; 

Event ; 



Figure 13: Translation of the example C program 
(fig. 1) into CVM. 



B. PROTOCOL TRANSITION SYSTEMS 

This section establishes the definition of security that we 
use in the paper and gives some sufficient conditions under 
which a protocol transformation (as done, for instance, by 
translating from a description of a protocol in G to a de- 



Load ; Apply' +/2; 



Load ; 
Load ; 
-f/2; Apply 
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scription in a more abstract language) preserves security. 

In order to define security for a program we first need 
to define tfie protocol that the program implements. The 
notion of a protocol is formally captured by a protocol tran- 
sition system (PTS), defined as follows: a PTS is a triple 
(S, s/,— >), where 5* is a set, si £ S and ^> is a labelled 
transition relation with transitions of the form 

(.V,s) A {(77i,si),...,(?7„,s„)}, 

where rj and r^i are valuations, s, Si £ S, and the right hand 
side is a nonempty multiset. We call a pair {rj, s) an exe- 
cuting process and think of rj as an environment in which s 
executes. We require that each executing process is of one 
of the following types: 

• a reading process, in which case all outgoing labels are 
of the form read b with b £ BS, 

• a control process, in which case all outgoing labels are 
of the form ctr 6 with b € BS, 

• a randomising process, in which case all outgoing labels 
are of the form rnd b with b £ BS and all b have the 
same length, 

• a writing process, in which case there is a single out- 
going transition with label of the form write b with 
beBS, 

• an event process, in which case there is a single outgoing 
transition with label of the form event b with b £ BS. 

We require that the transition relation is deterministically 
computable: there should exist a probabilistic algorithm 
that 

• given a left hand side which is a reading or a control 
process and a label computes the right hand side (in 
particular, the right hand side is uniquely determined), 

• given a left hand side which is a randomising process 
chooses one of the admissible outgoing labels uniformly 
at random and computes the right hand side, 

• given a left hand side which is a writing or an event 
process computes the outgoing label and the right hand 
side, 

• given inputs for which there is no transition, or mal- 
formed inputs, returns "wrong". 

The semantics of languages that we use (CVM and IML) 
will be given as a function from programs to PTS. 

We now define protocol states and show how they evolve. 
Intuitively a protocol state is just a collection of executing 
processes. The attacker repeatedly chooses one of the pro- 
cesses, which is then allowed to perform a transition accord- 
ing to the PTS rules. The executing processes are assigned 
handles so that the attacker can refer to them. A handle is 
a sequence of all observable transitions that have been per- 
formed by the process so far — this way the handle contains 
all the information that the attacker has about a process. 

Formally, an observation is either an integer or one of the 
reading, control, or writing labels. A process history is a 
sequence of observations. A protocol state over a PTS T is 
a partial map from process histories to executing processes 
over T. We extend the transition relation of T to a transition 
relation over protocol states as follows: Let P be a protocol 
state and h £ dom('P) a process history such that T contains 
a transition of the form 

Vih) U {(7?l,Sl),...,(j7n,Sn)} 



Let 



V = V-h {hoi ^{rii,Si)\l<i<n}, 



where o = / if Hs an observation and o — e otherwise, and 
we use an abbreviation f-x ~ f{x h- >■ _L}. Then there is a 
transition V -^—^ V' between protocol states V and V' with 
a command c and an action a, where 

• c = (h, I) and a = e if Z is a control label, or a read 
label, 

• c= {h, e) and a = Hf / is a a randomising label, a write 
label, or an event label. 

Given an initial protocol state and a command, the ac- 
tion and the resulting state are computable by the assump- 
tion that the underlying PTS transitions are computable. 
We extend the definition to multiple transitions and write 
■p — ■,:_2^ — "> * V' , iff there is a sequence of transitions 
leading from V to V' with commands ci , . . . , c„ and actions 
ai, . . . ,am. 

We shall be interested in the sequence of events raised by 
a protocol in the presence of an attacker. The execution of 
a protocol is defined as follows: 

Definition 2 (Protocol execution) Given a PTS T = 
{S,si,^) and an interactive probabilistic machine E (an 
attacker) we define the execution of the protocol T as a 
probabilistic machine Exec(T, E) that proceeds as follows: 

Maintain a protocol state V. Initially V = {e i-^ (0, s/)}. 
Keep receiving commands from the attacker and for each 
command c 

• compute a transition V — — > V' and set V :— V' . If no 
such transition exists or if the command is malformed, 
terminate, 

• if a = write b, send b to the attacker, 

• if a = event b, raise event b. a 

We shall assume that Exec(r, E) uses the most efficient 
algorithm to compute the PTS transitions. For a PTS T, an 
attacker E, and a resource bound t £ N let Events(r, E, t) be 
the sequence of events raised by the execution of Exec(T, E) 
during the first t elementary computation steps (each pro- 
tocol transition will typically involve multiple steps). We 
define a trace property as a polynomially decidable prefix- 
closed set of event sequences. This leads us to the definition 
of security for protocols: 

Restatement of definition 1 For a PTS T , a trace 
property p and a resource bound t G'N let 

insec(r, p, t) 

= sup{Pr[Events(r, £,f) i p\\ E attacker, \E\ < t} , 

where \E\ is the size of the description of the attacker. 

Intuitively insec(r, p,t) measures the success probability 
of the most successful attack against T and property p when 
both the execution time of the attack and the size of the 
attacker code are bounded by t. 

In the following we define a simulation relation on PTS 
that preserves security. This relation will be used as a tool 
to relate the security of a protocol described by a low-level 
CVM program P to the security of a protocol described by a 
more abstract IML process P that results from the symbolic 
execution of P. 
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In the definition of tiie simulation relation we shall refer 
to a slightly generalised notion of the protocol execution, 
parameterised by the initial environment: given a PTS T 
with an initial state si, an attacker E, and a valuation r;, let 
Exec^(r, _E) be the machine that executes like Exec(r, iS), 
but starts with {e i— > (77, s/)} as the initial state. 

We shall be interested in PTS in which the number of 
steps to reach a certain state is independent of how it is 
reached, as captured by the following definition: 

Definition 3 (History-independent PTS) A PTS T is 

called htstory-independent iff, whenever for any valuation r) 
and attackers E and E the machine Exec,,(r, _E) reaches a 
protocol state V in t non-attacker steps and the machine 
Exec,,(T, E) reaches V in t non-attacker steps, t — t. 

Given a history-independent PTS T, a protocol state V 
over T and a valuation rj we say that T reaches V from rj 
in t steps iff t is the number of non-attacker steps in which 
Exec,,(T, E) reaches V for some attacker E. 

For the PTS defined in this paper we shall ensure history- 
independence by recording enough information in the state 
to be able to reconstruct the set of transitions that lead into 
that state. 

Intuitively we shall say that a PTS f simulates a PTS T 
when an attacker has a way of playing against T in such a 
way that it solicits the same sequence of actions as when 
playing against T. In other words, given an execution trace 
of T, it should be feasible to reconstruct an execution trace 
of T with the same sequence of actions. The only additional 
complication is that the reconstruction should happen on- 
line, that is, the translation of a prefix of a trace should 
not depend on what follows the prefix. This corresponds to 
the fact that the attacker cannot see into the future. We 
achieve the on-line property by demanding that there is an 
equivalence relation between protocol states of T and T such 
that for each transition from V to V' in T and a state V 
equivalent to V there is a transition from V to V' in T such 
that V' is equivalent to V' ■ Most of the technicalities of the 
definition deal with placing restrictions on the computability 
of these transitions. 

Definition 4 (Simulation relation on PTS) For a poly- 
nomial p we say that PTS T with initial state sj p-simulates 
a PTS T with initial state si, writing T <p T iff both T and 
T are history-independent and there exists a relation < be- 
tween protocol states of T and protocol states of T and a 
partial map r from commands to sequences of commands 
such that 

1. for all valuations rj 

{e^ {ri,si)}<{e^ {rj,si)}, 

2- iiV <V and there exists a transition V -^-^ V with a 
command c and an action a then there exists a protocol 

state V' of f such that P' < V' and V iMi^* p', 

3. t{c) is computable in p{\c\ + \si\) steps, 

4- if 7-* < P and for some valuation ry T reaches V from 
r] in t steps and T reaches V from r; in t steps then 
i<p{t). a 

Tiieorem 4 (Preservation of security by simulation) 

For every polynomial p there exists a polynomial p such that 



whenever T <p T for PTS T and T, for any trace property 
p and resource bound t £ N 

insec(r, p, t) < insec(T', p,p (t)). □ 

Proof Let T <p f for PTS T and f and a polynomial p. 
Given an attacker E we shall construct an attacker E such 
that whenever the machine Exec(T, E) produces a sequence 
of events es within the first t stepswhen running with ran- 
dom tape R, the machine Exec(r, E) produces the sequence 
es within at most p'{t) steps when running with R, where 
p' is a polynomial depending on p. Thus, given that p is 
defined to be prefix-closed, any violation of p happening in 
T will happen in T with at least the same probability. 

The attacker E shall run an instance of E and iterate as 
follows: 

• Receive a sequence ci . . .Cm of commands from E and 
output t(ci) . . .t{c„^), 

• Forward any input to E. 

Let M be the state of the machine Exec(T, E) running 
with random tape R after having processed commands ci . . . c 
and M the state of the machine Exec(r, E) running with R 
after having processed commands r(ci) . . . t{c„). By induc- 
tion using (l)-(2) in definition 4 we can show: 

• if P is the protocol state contained in M and V is the 
protocol state contained in M then P < P, 

• the instance of E run by E in M has executed the 
same computations as the instance of E in Af , the same 
portion of R has been consumed, and the same sequence 
of events has been raised. 

To bound the execution time of M assume that M has 
executed t steps and M has executed t steps. Let t — ts+tr, 
where ts is the number of steps executed by the attacker and 
tr is the number of non-attacker steps. Similarly split t = 
tE + tT- The attacker E runs an instance of E which takes 
time 0(tE) and additionally issues n queries to r. According 
to (3) in definition 4 the runtime of each query is bounded by 
p(|cma2!| + |si|), where Cmax is the longest command received 
from E. Both n and |cma2,| are bounded by tE and \si\ is 
bounded by tr as Exec(r, E) needs to construct the initial 
state. Overall 

Ie < 0{tE) + n ■ p{\c,nax\ + |s/|) 

< 0(ti5)+ti5 ■p(ti5+tT). ■ 

According to (4) in definition 4 ir < pitr). We conclude 
that i <t- p{t) + p{t) + 0{t). 

We shall be interested in executing a PTS in the context 
of another PTS. This is useful for modelling: we shall spec- 
ify the threat model for a CVM program by embedding it 
as a subprocess within an IML process. This way we can 
formally define a setting with multiple threads and shared 
key creation and distribution without having to add process 
control primitives to CVM itself. An important property of 
embedding that we define is that it preserves the simula- 
tion relation. In order to define the embedding we start by 
adding holes to PTS: 

Definition 5 (PTS witii a hole) Given a polynomial p 
we define a PTS with a hole identifiable m p-time as a 
history-independent PTS with initial state si that contains 
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a special state [] such that there are no transitions from [] 
and such that there exists an algorithm that, given a pro- 
cess history h, runs in time p{\h\ + |s/|) and decides whether 
/i is a history of a hole, that is, whether for all protocol 
states V reachable from some environment rj and such that 
h G dom('P) the process 'P(^) is of the form (77', []) with 
some environment rj' . n 

The definition intuitively states that the attacker must 
have an efficient means to decide whether a process is a hole 
given the observable history of the process. We can now 
proceed to defining the embedding: 

Definition 6 (Embedding of PTS) Given a PTS with a 
hole Te = {Se,sie,-^b) and a PTS T = (5',s/,->) we 
define the embedding Te [T] of T within Te by 

Te[T] = {{{Se \ []) X {si}) US, {siE,si), We U ^), 

where -^'e is obtained from — >_b by replacing each occurrence 
of s G Se \ by {s,si) and by replacing each occurrence of 
[] with sj. n 

Tiieorem 5 (Simulation and embedding) For each two 
polynomials p and p' there exists a polynomial p" such that 
if T and f are PTS with T <p f and Te is a PTS with a 
hole identifiable m p -time then Te\T] <pii Te[T]. a 

Proof We start by giving a definition of embedding for 
protocol states. Given a protocol state V that contains holes 
with histories hi, . . . ,hn and protocol states Vi, . ■ . , Vn we 
define the embedding 



V[Vi,... 



Vu] 



{ hihj H^ Vtihj) \l<i<n, hj £ dom{Vi) } . 



Let Te = {Se,sie,^e),T = (5, s/, ^), and T = {S,si,^) 
be defined as in the theorem. We show how to extend the 
relation < on protocol states and the function r given by 
the definition of simulation relation of T and T to a cor- 
responding relation <e and a function te for Te[T] and 
TeIT]. For a protocol state P over Te[T] and V over TE[f] 
we set V <B V iff there exist protocol states Vi, . ■ . , Vn over 
T, Vi, . ■ . , Vn over T and a protocol state Ve over Te such 
that Vi < Vi for all i and 

V = Ve[Vi,. ..,Vn] and V = Ve[Vi,. ■ ■ ,Vu]. 

Let a command c = {h,d) be given. We compute te{c) 
as follows: first check whether h contains a prefix hE such 
that hE is a history of a hole. If it doesn't, set rE{c) = c, 
otherwise let h' be a process history such that h = hEh' and 
let 



{hi,di),. . . , {h„i,dri 
te{c) = {hEhi,di), 



--r{{h',d)), 

, {hEhm,dm.)- 



It is straightforward to check that (l)-(2) in definition 4 are 
satisfied for Te[T] and TeIT] with te and <e. 

To prove (3) we need to bound the evaluation time of te (c) 
for a command c = {h,d) in terms of \c\ and \s'je\ where 
s'ie = (siEjSi) is the initial state of Te[T]. To evaluate 
TEic) the following operations are performed: 

• Run the hole-detection algorithm for each prefix of h. 
According to the assumption on Te this can be done in 

|fe|-p'(|/il + ls^Bl) steps. 



• a h — hEh' , where hE is a history of a hole, evaluate 
t{c') for c' = {h' , d). According to the assumption that 
T <pf this takes p{\c'\ + \si\) steps. 

Given that \h\ < \c\, \c'\ < \c\, and |s/| < \s'je\, the overall 
evaluation time of te is bounded by 

0{\c\ -p'dcl 4- \s'je\)+p(\c\ + \s'ie\))- 

To prove (4) choose a valuation 77 and assume that Te[T] 
reaches a state V from rj in t steps, Te[T] reaches a state V 
from ri in t steps, and V < V. By definition the states are 
of the form 

V = Ve[Vi, ..., Vn] and V = Ve[Vi,. . . ,Vn], 

where Ve is a state of Te and Vi < Vi for all i. For each i 
let rji be the environment of the ith hole in Ve- It is easy 
to see that t = 0{tE -I- ti -I- ... -I- tn), where fe is the time 
in which Te reaches Ve from rj and ti is the time in which 
T reaches Vi from rji. Similarly t = 0{tE -I- ii -I- ... -I- in), 
where ti is the time in which T reaches Vi from rii. From 
the assumption T <p T we know that ti < p(ti) for each i. 
Assuming w.l.o.g. that p is at least linear and monotonic, 
we conclude t <p{t). ■ 

The definition and the theorem can easily be extended to 
the setting with multiple holes []i, . . . , []„. We shall write 
Te [Ti , . . . , Tn] to denote the corresponding embedding. 

C. SEMANTICS OF CVM 

This section presents the formal semantics of the CVM 
language, the syntax of which is introduced in fig. 3. In the 
following, let A'^, val, and bs be chosen as in section 2. In 
order to define the semantics, we associate to each CVM 
program the protocol transition system that is generated by 
it. Let a program P G CVM be given. Let var(P) be the set 
of variables used in Ref instructions within P and choose an 
allocation function addr: var(P) -> N. We require that the 
allocated memory ranges do not overlap, that is 

{addr(i')}^ n {addr(u')}^ = for all v / v' . 

We let lP\c be the PTS with the initial state (Init, P) and 
all other states of the form [A"^ , M'^ , S'^ , P) , as described 
in section 2. The transition rules of \P\c are presented 
in fig. 14. The right hand side of each transition always 
contains a single process, so we omit the multiset bracket. 

The rule (C-In) stores the input value in the environment 
in addition to placing it on the stack. This way the resulting 
PTS is history-independent — the state contains the informa- 
tion about all inputs so that there is only one trace leading 
to each state. 

D. SEMANTICS OF IML 

Just as for CVM, the semantics of IML is given as a pro- 
tocol transition system. We choose the functions bs and val 



as in section 2 and let the function 



be defined as in sec- 



tion 4. For an IML process P we let |P]]/ be the PTS with 
IML processes as states, with starting state P and transi- 
tions described in fig. 15. 

The rules (I-Repl) and (I-Par) are standard replication 
and parallel composition rules from the pi calculus. The 
rule (I-Nonce) is interesting in that it restricts the gener- 
ated nonce to be of a given length. The input rule (I-In) 
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Vu e var(P) : {addr(ti)}^ C Addr 



V, (Init, P) ^^^^ ri, (U.evaKP) {addr{^;)}^ , 0, Q, P) 
ri, (A", M", S", Const b; P) ^^^^> r;, {A", M", b :: S", P) 

■q, (yt=, M'', S", Ref v; P) ^^^ r;, {A'', M'=, bs(addr(^)) :: 5=, P) 

peBS \p\=N {val(p)}^^i(,j C Addr \ A'' 
r], {A", M", I :: <S=, Malloc; P) ^^^^ rj, (A" U {-va-Kp)} v!!,i{i) : •^'': P - S", P) 

b, bE G BS \b\ = val{0 < \bE\ Vi G |6| : b[i\ = if val(p) + i G dom(A^'=) then A^'=(val(p) + i) else &£;[«] 
?J, {A", M", l::p:: 5^ Load; P) '" ''''> 17, (^S A^S 6 :: 5^ P) 

baBS \b\ = val(0 < 2^^ 
ri, {A'', M", I :: S" , In i) src; P) "'''' ''> r?{»; i-> b}, {A", M", b :: S", P) 

V G dom(r;) I*?!")! < 2^^ 
ri, {A", M", S", Env v; P) ^^^ r), {A'', M'=, hs{[ri{v)\) :: r){v) :: S", P) 

ar(op) = n b = Aop{bi, . . . ,b„) ^ ± |b| < 2^ 

■q, {A", M", fei :: . . . :: b„ :: S'= , Apply op; P) ^^^^ r), (^t:, X^:, bs(|6|) :: 6 :: S-^ , P) ' 



Jj, (^^ A^^ 6 :: S", Out dest; P) l2fl\ ri, {A", M", 5^ P) 



6 = jl 



t;, (yl^ XS b :: 5=, Test; P) ^^^^ »?, (^S A^S 5^ P) 

{val(p)}|,| C ^- 



?7, {A", M", p::b:: 5^ Store; P) ^^A »y, {yt'=, A1'= {val(p) + i ^ b[{\ \ i G l^]} , 5^ P) 



(C-Init) 

(C-Const) 

(C-Ref) 

(C-Malloc) 

(C-Load) 

(C-In) 

(C-Env) 

(C-Apply) 
(C-Out) 

(C-Test) 
(C-Store) 



Figure 14: The concrete semantics of CVM. 



does not place such a restriction and allows the input to be 
of any length (this is more permissive than the CVM input 
rule). The rules (I-Out) and (I-Event) generate an output 
and an event transition respectively. The conditional rules 
(I-Cond-True) and (I-Cond-False) have different control la- 
bels, so that the attacker can distinguish the branch that 
has been taken by the process. Unlike CVM there is no ex- 
plicit rule for reading environment variables, because IML 
operates on the environment 77 directly. 

Consider IML enriched with an additional syntactic form 
[]i (a hole) with i G N and without any reductions. For 
an IML process P with holes the semantics |[P|/ is a PTS 
with holes (definition 5). The history of a process uniquely 
determines its state, for instance, given the process P — 
!(if e then [] else 0) and history h — (ctr e) 1 (ctr 1) 1, it 
is easy to see that /i is a history of a hole in |[P|/. Here it is 
important that the true and the false branches in fig. 15 have 



different control labels. In general, whether /i is a history of 
a hole in [PI/, is computable in time linear in |P| -I- |/i| . Just 
like CVM IML is history-independent because it records all 
the inputs in the environment. Thus the following holds: 

Lemma 1 (IML with holes) For an IML process P with 
holes the semantics |[P|/ is a PTS with holes tdentifiable in 
p-time for some fixed linear polynomial p. n 

The semantics of mixed IML and CVM processes is de- 
fined by using a PTS embedding as follows: 

Definition 7 (Mixed semantics) For a process Pe G IML 
with n holes and processes Pi, . . . ,P„ G CVM let 

IPe[Pi, ..., P„]Ic/ = IPeUIPiIc . . . , [Pnjc]. 
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(r,, !P) ^^^ {(r,, P), (r,, !P)} 



(r,, P|Q) ^^^ {(r,, P), {,?, Q)} 

feePg, |b| = val([[e] J 
(r,, {;.x[e]);P)^:^^{(r,{x>^b}, P)} 

fcg P5 

(r;, m(x); P) ^?^ {(»7{x ^ 6}, P)} 

fa = ¥\n + ^- 
(r,, out(e);P)^^^^iil^{(r,, P)} 



(I-Repl) 
(I-Par) 

(I-Nonce) 

(I-In) 

(I-Out) 

(I-Event) 

(I-Cond-Truo) 
(r;, if e then P else Q) -^^^ {(r^, P)} 

Ifk^jO (I-Cond-False) 

(»;, if e then P else Q) ^^^ {(jy, Q)} 

b = M^ 7^ -L 

(r;, let a; = e in P else Q) -"^^^ — >■ {(r;{x M- 6}, P)} 

(I-Let-True) 

W^ = ± 



6 = H^ + - 



(r,, event(e);P)^^^?5l4{(r,, P)} 



W^ = ii 



(r;, let x = e in P else Q) >■ {(r;, Q)} 



(I-Let-False) 



Figure 15: The semantics of IML. 



E. SIMPLIFICATIONS 

Fig. 16 presents the simplification rules used in our sym- 
bolic execution algorithm. The simplification function is 
concerned with simplifying range expressions when possible, 
for instance, an expression of the form (a|&){a;, j/}, where 
E h (a; = getLen(a)) and E h (j/ = getLen(fe)) will sim- 
plify to h. The main work is done by two recursive functions 
cutL, cutR: SExp x SExp — >■ SExp that given a length ex- 
pression I and a concatenation expression e attempt to split 
e at the position given by /. If this succeeds, cutL returns 
the part of e to the left of the split position and cutR returns 
the part to the right. 

In order to simplify an expression of the form e{eo, e;} the 
function simplify first checks two special cases: if Co is equal 
to zero and ei is equal to the length of e then the range 
can be removed and the expression can be simplified to just 
e. On the other hand if ei is equal to zero then the range 
expression can be simplified to e. If e is itself a range expres- 
sion of the form e'{eo,e'i} then the two ranges are merged 
giving the result e'{eo+neoj£i}- If e is a concatenation then 
the functions cutR and cutL are applied. Finally, if all of 
the above fails, the original expression is returned without 
simplification. 



cutLs(i, eil . . . |e„) 

ei| . . . |ei_i| simplifyj; (cutLs (/ -fil',ei)) 
if S h (/ > /') A{1 <l' +N getLen(e,)), 
where I' = S'~-^ getLen(ej), 
(eil . . . |e„){JO, i} otherwise, 
cutRs(^ ei| . . . |e„) = 

simplifyj;{cutRE(/ -n I' , (ii))\ei+l\ • • ■ |e„ 
if E h (/ > /') A{1 <l' +N getLen(e,)), 



where I 



'_-^ getLen{ej), 



■ le„) — pj 1} otherwise, 



(ei| . .. |e„){i,getLen(ei| . 

simplify J, (e{eo,e J) 

if S h(eo = iO) 

A (e; = getLcn(e)) 
if S h {ei = iO) 
e'{eo+me'^,ei} if e = e'{e^, ej} 

cutLE(e;, cutRE(eo, e)) if e is a concatenation 
.e{eo,ej} otherwise. 



Figure 16: Simplification rules. 



We omit the soundness proof for our simplification func- 
tion. 



F. SYMBOLIC EXECUTION SOUNDNESS 

We prove our main result (theorem 1). We shall do so 
by showing that the PTS H-Plsl/ resulting from the sym- 
bolic execution of a program P £ CVM simulates (in the 
sense of definition 4) the PTS |-P]lc resulting from running 
P directly. This result is captured by lemma 4. Theorem 1 
then follows by combined application of theorems 4 and 5 
and lemma 1 together with definition 7. 

For compactness we shall write b instead of val(6) for 
b £ BS. When referring to valuations we shall mean ex- 
tended valuations of the form r): VarU PBase — > BS±. For 
an extended valuation rj let var(?7) be the restriction of rj to 
Var. 

We shall make use of the soundness of the function getLen 
introduced in section 6 that we state here without proof: for 
any e £ SExp and valuation rj 

(H^ / ±) A {\leU < 2^) => [getLen(e)l^ = bs(|[el^|). 

The main tool in the proof of lemma 4 is a concretisation 
function that, given a valuation, maps symbolic execution 
states to concrete execution states. Given a symbolic state 
s = {'S,A'',M'',S'',P) and a valuation rj we say that s is 
Tj- consistent when all expressions in s are well-defined with 
respect to rj, when rj maps all symbolic memory locations 
to disjoint ranges that are within allocated memory bounds, 
all conditions in E hold with respect to rj, and rj agrees with 
the addr function for stack variables. Formally, we say that 
s is 77-consistent, iff 

1. for aU pb £ dom{M") : 



\iM%pb)U < iA%pb)jl 
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2. for all pb, pb' G dom(^'') with pb / pb' 
{vipbf] n {vipb'f] 



Finally 



M=>{p6')K 



= ih 



3. for all ■!/; G E: n(/^j|^ 

4. for all eG 5": [e]^ / ±, 

5. for all V G var(P) : J7(stack v) = bs(addr(u)). 

For an 77-consistent state s let conc,,(s) = {A""^ , M^'^,S^'^, P) 
be the concrete state where S""^ is obtained from S" by ap- 
plying [■]],, to each element and 



A-=ij{{v{pbr} 



iA=(pb'n« 



pb G dom(y4°) 



M' 



v{pbf + t^lM%pb)lM 



pb G dom(A^°), 
i<\lM%pb)l^ 



The conditions of 7;-consistency guarantee that M""^ is well- 
defined: symbolic expressions will map onto concrete mem- 
ory without overlapping, that is, for each p G N there is only 
one pair pb, i such that ri{pbY' + i = p. 

The special state (Init, P) is defined to be ?7-consistent 
for any rj with dom(r7) C Var and we let conc^(lnit, P) = 
(lnit,P). 

We start by proving two lemmas relating the symbolic and 
the concrete execution of a program. Lemma 2 shows that if 
a symbolic state s maps to a concrete state c then the state 
following s in the symbolic execution can be mapped to the 
state following c in the concrete execution. Lemma 3 shows 
that if in a symbolic and a concrete execution the states can 
be mapped to each other then the IML program generated 
by the symbolic execution performs the same actions as the 
concrete execution. 

Lemma 2 Let {rjc,c) — s- {rj'c,c') be a concrete transition 
(fig. 14), 3 — > s' a symbolic transition (fig. 8), and 77 an 
extension ofrjc such that s is rj-consistent and conc,,(s) = c. 
Then there exists an extension v/ of both rj and rj'^ such that 
s' is ri' -consistent and conc^'(s') = c'. n 

Proof By definition of the concretisation function both the 
concrete and the symbolic step are executed with the same 
instruction or both perform the initialisation. We prove the 
lemma by enumerating the pairs of rules that generate the 
transitions. For the purpose of this proof we are not inter- 
ested in the values of transition labels / and A. 

In the following A'^, . . . and A'^' , . . . refer to components of 
c and c' respectively. A" , . . . and A"' , . . ■ refer to components 
of s and s' , and A""^, . . . and A"'^' , . . . refer to components of 
conc^(s) and conc^/(s'). 

1. (C-Init) and (S-Init) 

By definition of ry-consistency for the initial state we 
know that stack v ^ dom(r;) for all v G var(P). We 
show that the lemma holds with 

rj — rj {stack v ^-> bs(addr(w)) [ v G var(P)} . 

The second condition of ?7'-consistency of s' follows by 
the choice of addr function (appendix C), the other con- 
ditions are straightforward to check. In s' each location 
in the symbolic memory is initialised to £, so applying 
the definition of conc„ we see that M^"' = = M"' . 



•^"' = i J <^ (V(stack vf\ 
U |{bs(addr(«))~} 



stack u)]* 



bs{JV)N 



i'Gvar(P) 

= U {{addr(«)U} = ^='. 

V G var ( P ) 

2. (C-Const) and (S-Const) with Const b 

Both the concrete and the symbolic transition have the 
effect of putting the same bitstring b onto the stack. 
Thus both the r;-consistency and the state correspon- 
dence are preserved and the lemma holds with rj' = rj. 

3. (C-Ref) and (S-Ref) with Ref v 

The concrete transition puts bs(addr(ii)) on the stack 
and the symbolic transition puts ptr(stack v, iO) on the 
stack. By ry-consistency 7)(stack, v) = bs(addr(u)), thus 

|[ptr(stack v, J0)|,, = 77(stack,?;) +6 iO = bs(addr(?;)) 

and the lemma holds with rj' = rj. 

4. (C-Malloc) and (S-Malloc) 

Let p and I be defined as in rule (C-Malloc) and pb and 
ei be defined as in rule (S-Malloc). We show that the 
lemma holds with rj' = rj{pb 1— > p}. It is straightforward 
to check that the first condition of ry'-consistency of s' 
holds, taking into consideration that |A^^'(p6)J,, = e. 
To prove the second condition, let pb' G dom(^°') such 
that pb' 7^ pb. In that case pb' G dom(^'') and by 
definition of conc^ and the state correspondence of c 
and s 



Uipb'f] = [vipb'f] 



I.4='(pb')]«, I- ■- 'J MMpfc')!^ 

c A'" = A". 
By initial state correspondence and the definition of rj' 

= {p^\ =(p''| (lAddrXA". 

Thus the allocation ranges of pb and pb' are disjoint 
and the condition (2) holds. Conditions (3) to (5) are 
straightforward to check. To prove that conc^/(s') = c! 
observe that 



A' 



A'^yj 



{rj'ipbf} 



-^^^KL--^^'' 



|[ptr(p6, iO)}^, = v'ipb) +b iO = p. 

5. (C-Load) and (S-Load) 

Both the concrete and the symbolic rule have the effect 
of replacing two values on the stack with a new value. 
In the concrete transition the new value is 6 G BS such 
that b[i] = Ai'^ip + i) whenever M'^(p -\- i) is initialised 
and p is defined as in rule (C-Load). In the symbolic 
transition the new value is e = simplify 5^(A^°(p&){eo, e;}), 
where pb, Co, and ei are defined as in rule (S-Load). We 
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shall prove that JeJ,, = & so that the lemma holds with 
v' = V- 

Let bh ~ lAi" {pb)],r] ■ By definition of cone, and initial 
state correspondence 

where we use the notation M'^{I) for / C Addr to de- 
note the sequence of bits of A^'^ with addresses in /. 
Thus A4'^ is defined in the range {fept,}|(,^ | , in particular 



Opb + \bh\ < 2 . 



n 



Let bo = [eolr, and bi = |[e;|,,. Evaluating the con- 
ditions of the rule (S-Load) and using the assumption 
of 77-consistency we obtain bo +n bi < |getLen(eh)|,,. 
Because \bh\ < 2^ we can apply soundness of getLen 
which together with the definitions of bitstring opera- 
tions -|-N and < gives 



K+h < \bh\ 



("**-> 



Using the definition of the function sub 

lM''{pb){eo,ei}j^ = snh{bh,b':i,bf) 

= sub(^>1=({6^,}^^^J,Cfer 

=M^(^{b':,+b:}^^ 

This allows us to apply soundness of simplify: 
[el, = lsimpmy^{M%pb){eo,ei})l^ 

= lM%pb){eo, ei}j^ = M^ ('{fe^, + fe^}^, 

By the state correspondence of c and s we obtain 

p = lpti{pb,eo)lr, = v{pb) +b [eolr, = bpb +b bo, 
\b\ = letl = bi . 

By (*) and (**) 6^^ + b'^ < 2^" , thus 

Opb + O0 = [bpb +b bo) =p ■ 
Substituting this into the above wc get 

H, = M^ ({&«, + l^}^ = M^ ({p'*}^ J = b 

The final equality holds as the referenced memory cells 
lie within the initialised range {6p6}|6,,|- 

6. (C-In) and (S-In) with In v src 

The rule (C-In) takes a value I from the stack and places 
a value b of length f on the stack. Additionally it 
updates rj'^ = tjc{v i— >■ b}. The rule (S-In) takes an 
expression e; from the stack, places v on the stack, and 
adds the fact len(ii) — e; to S. We show that the 
lemma holds with 77' — ri{v i-> b}. Due to initial state 
correspondence |[ei|,, = I and due to the condition of 
the rule (C-In) |6| < 2^, thus 

lleniv)f^,=hs{\b\f = \b\^f^le,f^,, 

so that the new fact is indeed valid. 



7. (C-Env) and (S-Env) with Env v 

The rule (C-Env) places rie{v) together with bs(|»7e(«)|) 
on the stack (the valuation rj in fig. 14 corresponds to rje 
in the lemma). The rule (S-Env) places v and len(«) on 
the stack. By assumption of the lemma r]{v) = ??e(i'), 
so it is straightforward to check that the lemma holds 
with r] — rj. 

8. (C- Apply) and (S- Apply) with Apply op 

The rule (C-Apply) places on the stack the bitstring 



b = Aop{bi,. 



together with its length, whereby 



bi, . . . ,bn are taken from the stack. The rule (S- Apply) 
places on the stack the value e — apply(op, ei, . . . , en) 
together with len(e), whereby ei , . . . , e„ are taken from 
the stack. We show that |[e|^ = b so that the lemma 
holds with r]' = r]. By initial state correspondence we 
have [ei],, = bi for all i. We enumerate the cases arising 
from the definition of apply given 6 7^ ±: 

(a) n = 2, ei = ptr(p6, 60), 62 G lExp, and op = -\-b. 
In this case 

b= |ptr(pfe,eo)]l^ +6 |le2l^ 
= n{pb) +b Ieol„ +6 |[e2l„ 
= |ptr(p6, Co -1-662)1,, 
= Iapply(+b,ptr(pfe,eo),e2)l,, 

(b) n = 2, ei = ptr(p6, Bo), 62 = ptr(p6, e',,), op = -f 
In this case 

b = Iptr(p6,eo)|,, -6 lptv{pb,e'o)jr, 
= V{pb) +b leoh ~b Mpb) +b le'oh) 
= boh -6 le'oh 
= Iapply(-6, ptr(pb, eo),ptr{pb, e'^))]^ 

(c) ei, . . . , e„ £ lExp. In this case 

b = Aopileih, ..., le„h) = Iop(ei, . . . , e„)|^ 
= [apply(op,ei,...,e„)l^ 

9. (C-Out) and (S-Out) 

The lemma holds trivially with rj' = -q. 

10. (C-Tcst) and (S-Test) 

The rule (C-Test) removes a value b from the stack. 
The rule (S-Test) removes an expression e from the 
stack and adds e to the set of facts. We show that the 
lemma holds with ri' = rj. We only need to prove that 
|[e|^ = il, but this follows from the assumption of the 
lemma that |e]],, = b and the condition 6 = il of the 
rule (C-Test). 

11. (C-Store) and (S-Store) 

Both the concrete and the symbolic transition perform 
a memory update. These updates are 

M-'^M^pb^e'^}, (1) 

and pfe 



I < 



1-1} 



where p and b are defined as in rule (C-Store 
and ejj are defined as in rule (S-Store). 
We shall prove that the lemma holds with rj' — rj. We 
start by showing that s' is 7)-consistent. As the transi- 
tion only updates the memory, we only need to check 
that KI, / ± and \le'M < lA^pb)}^ Let en, e.. 
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eih, 6;, Bo, e, andpb be defined as in (S-Store). For e^ G 



{eh,es,eih,,ei,eo} let bx 



and let fe. 



'pt. ■ 



Ipbh- 



By initial state correspondence |e]],, = 6. The rule 
(C-Store) assumes {p**}!;,! Q A'^ ., which implies \h\ < 

2^. Using the soundness of getLen and the definition 
of hi we obtain 

6f = [getLen(e)i:j=|[el„| = |6|. 
By initial state correspondence 

where the second equality follows by soundness of getLen 
and the fact 

6^, + |6h|<2^ (2) 

established by the first equality. 

We shall distinguish between two cases in the premise 

of the rule (S-Storc). The first case is 

S h [to +N e; < eih), ejj = simpfifyj,(e5,'), where 
fift = eh{0, eo}|e|eh{eo +n ej, em -n (e,, +n e;)}. 

In this case the same argument as for the rule (S-Load) 
yields 



Oo +0; < \hh\ = hiu. 



(A3) 



Substituting the value of hh and expanding the defini- 
tion of the function sub under consideration of (A3) we 
obtain 



¥X = M 



\ Ih 



-fcM-fcN 



(A4) 



Applying the soundness of the function simplify we get 
KI. = KI. / i. From fef = |fe| follows |K1,| = 
|[eh|,,|. By initial state correspondence 1 [eh] ^ I < [^"(pfe)]^. 
This proves 77-consistency of s' in the first case. 
The second case in the premise of the rule (S-Store) is 

S h (eo +N e; > eih) A (eo < eiu) A [eo +n e; < e^), 
e'h = simplify J, (eh{0, eo}\e). 

Together with 77-consistency of s this implies the fol- 
lowing condition on bitstrings: 

[ho+bi > bih) A [bo < bih) A (bo + h < fej. (B3) 

This allows us to expand the definition of sub and apply 
soundness of simplify to obtain 



le'nh = M' [{b^,} 



b/±, 



(B4) 



Using (B3) 

I le'J,, I = bo + |b| = 6? + 6f < 6^ = M'(p6)C, 

which proves 77-consistency of s' in the second case. 
The next step is to show that conc,,(s') = c' . Both in 
the first and in the second case above ||ej,],,| > |[e/i],,| 
(in the first case they are equal, in the second case it 
follows from (B3)). Comparing the definition of M""^' 



and A4"'^ and using the relation (1) between A^*' and 



M^^' = M'Ub''^, + i^{le'M[i]\z< 



IKl.l} 



Substituting the value of [e'^J^ from either (A4) or (B4) 
and using the assumption M""^ = M"^ from the initial 
state correspondence we can simplify this to 

M"" = M'' {bpb + b!^ + i ^ b[i\ \i<\b\]. 

By initial state correspondence 

p = |[ptr(p6, eo)],, = 77(p6) -|-6 [eo],, = bpt +b bo, 

It is 6p5 -1- 60 < 2^ both in the first and in the second 
case above: in the first case it follows from (2) and (A3), 
in the second case it follows from (2) and (B3). This 
imphes p'* = fe^j + b^. Thus 

M"" = M''{p''' + i^b[i\\i<\b\j^M'". 

We call a valuation 77' minimal with a property cj) ifi' 77' 
satisfies <j) and ri'_^ does not satisfy cj> for all x G dom(77'). 

Lemma 3 Letrjc, rj'o, rj, andrj' be valuations and s and s' be 
symbolic states such that s is r]- consistent, s' is r]' -consistent, 
and there are transitions {r]c, conc,,(s))— >-(7;^, conc^/(s')) and 
s — > s with A 7^ e. Assume additionally that rj is a min- 
imal extension of rj with the property above. Then for all 
P £ IML the following is a valid IML transition (fig. 15): 

{(var(7;),;P)}A{(var(7;'),P)}. 

Proof We prove the lemma by case distinction over all 
pairs of I and a that can occur. 

1. I — read b and A = in(a;); by rules (C-In) and (S-In). 
From the correspondence between the symbolic and the 
concrete transition we obtain 77' (x) = b. Because 77' was 
chosen to be minimal 7;' = ri{x 1-^ b}, which also implies 
var(77') = var(77){a; 1-^ b}. The lemma follows by rule 
(Hn). 

2. I — rnd b and A — {iyx[ei]); by rules (C-In) and (S-In). 
The correspondence between the symbolic and the con- 
crete transition implies var(7;') = var(7;){a:: 1— >■ b}. Ad- 
ditionally the correspondence yields |fe| — [e;]^, so that 
the lemma follows by rule (I-Nonce). 

3. I = write b and A = out(e); by rules (C-Out) and 
(S-Out). 

From the correspondence between the symbolic and the 
concrete transition we obtain [e],, = b. Additionally 
77' = 7; by minimality of 77'. The lemma follows by rule 
(LOut). 

4. I — event b and A = event(e); by rules (C-Out) and 
(S-Out). 

The proof is exactly analogous to the case above, the 
lemma follows by rule (I- Event). 

5. / = ctr I and A = if e then by rules (C-Test) and 
(S-Test). 

From the correspondence between the symbolic and the 



concrete transition we obtain [e] 



il. Additionally 



77' = 77 by minimality of 77'. The lemma follows by rule 
(LCond-True). ■ 
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Lemma 4 There exists a fixed polynomial p such that for 
any P £ CVM with |P]ls / ± 



Proof Let P be a CVM program such that |P|s / ±. Let 
T = |P]lc and f = liPjsji- We shall show that T <p f 
for some polynomial p by giving a relation < between states 
of T and T as well as a translation function r that satisfy 
definition 4. Let si, . . . ,Sn be the symbolic execution trace 
of P with labels Ai, . . . , A„_i and let Pi — Xi . . . A„_iO £ 
IML. This way Pi = {P^s and P„ = 0. Let Pi, ... , P,„ be 
protocol states over T such that Pi = {e M- (j^i, (Init, P))} 
for some initial environment 771 and there is a transition 

Pi > Pi+1 With a command (ni,ai) and an action 

fli for each i. As CVM does not perform replication, each 
protocol state will be of the form Pi = {hi 1-^ iVij'^i)} for 
some state Ci and valuation rji. 

No concrete trace of CVM is longer than the symbolic 
trace (both are bounded by the number of instructions in 
P), so clearly m < n. By definition the initial symbolic 
state si = (lnit,P) is 771 -consistent and conc,,i(si) — ci. 
By setting f^i = rji and repeatedly applying lemma 2 we 
obtain a sequence 771 , ... , fjm of valuations such that for each 
i the valuation rji is an extension of rii, the state Si is fji- 
consistent and concij.( Si) — Ci. Additionally we can choose 
the valuations such that r^i+i is a minimal extension of rji 
satisfying the property. For each i = 1, . . . ,m we define a 
protocol state Vi over T asVi = {hi \-^ {va,r{fji), Pi)}, where 
hi is obtained from hi as follows: Let / C {1, . . . , 72 — 1} be 
the set of indices i such that p 7^ P+i. Given a history hi 
of the form hi = oil . . . Oi_il (every CVM rule only has one 
process on the right hand side, so the replication identifier 
is always 1) let hi = Oi^ 1 . . . Oi^, 1, where {ii, . . . , ik} — If] 

Given a protocol state P over T and a protocol state P 
over T we define P < P iff there exist sequences of states 
Pi , . . ^ , Vm and Pi , . . . , Vm as above such that P — Vi and 
V = Vi for some i. We define the function r from commands 
to sequences of commands as follows: 



T{{h,d)) 



{h,d), if ft = Oil . . . Oi_il, and i £ 7, 
e otherwise. 



We now show that the relation < and the function r satisfy 
definition 4, so that T <p T for some polynomial p. The 
conditions in definition 4 are satisfied as follows: 

1. Any initial valuation 7^1 is not an extended valuation so 
that var(77i) — rji. By definition 



{£h^(771,(lnit,P))}<{£l 



(var(77i),Pi)} 
iVulPh)}- 



2. Let V < V and assume that there exists a transition 

P — - — '—^ P'. By definition of the relation < there 
exist sequences of states Pi, ... , Vm and Pi,_. • ■ , Vm as 
above such that V — Vi, V' — Vi+i and V — Vi for 
some i < m. It suffices to show that 



V^ 



Tah,d)),a , - 

> / l+l- 



(*) 



proof of lemma 2 we see that then var(77i) = var(77i+i) 
and so Vi — Vi+i- The program performs no action so 
that a = e and by definition T({h,d)) = e, thus (*) is 
satisfied. 

3. To compute r it is necessary to know /, but this can 
be computed by an inspection of P in linear time: it 
is 7 + 1 G / iff the ith instruction in P is one of In, 
Out, or Test, that is, an instruction that generates a 
nonempty label A in the symbolic execution. Thus t{c) 
is computable in time linear in \c\ + |P|. 

4. Assume that for some valuation 77 and attackers E and 
E the machine M = Exec^ (T, E) reaches a state P in 
t steps and the machine M = ExeCjjCT, P) reaches a 
state V in i steps and V ^V. If 77 is the environment 
of the process in P and 77 is the environment of the 
process in P then 77 is an extension of 77, in fact 77 = 77, 
as both environments get updated by rules (C-In) and 
(I-Nonce), (I-In) in the same way. It is easy to see that 

i = 0(n,.-(f. + |IPls| + |7^|)), 

where fitr is the number of transitions performed by 
M and te is the number of steps to evaluate the most 
expensive IML expression during the execution of M. 
All of these values can be bounded in terms of t as 
follows: The IML model |P]]s performs at most the 
same number of transitions as the PTS program P, so 
that ntr < ritr < t, where ntr is the number of tran- 
sitions executed by M. By construction of the sym- 
bolic execution ||P]ls| ~ 0{\P\) — 0{t). Furthermore 
1^1 ~ l^?! f: ^- Finally we shall prove by induction that 

4v = Mv for 
Consider the 



if ie is the number of steps to evaluate 



some expression e then te 
following cases: 



0{t) 



e = h for some h £ BS. In this case te ~ \e\. 
e = X for some x G Var. In this case |e]],, — r]{x), 
so that ie < t. 

e — op{ei, . . . ,e„) with some op £ Ops. For bit- 
strings bi, . . . ,bn £ BS let top{bi, . . . , 6„) be the 
number of steps to evaluate Aop{ei, . . . , e-n) and let 
ii be the number of steps to evaluate |ei],,. Every 
operation in e is also performed by M, thus 

ie = topileiln, • . • , ISnjr,) + i 1 + . . . + t„ 
< V(Ieil^,...,[e„10-h^|e,|-O(t) 



o(t)-fe 1^^1 + 1 



< 0{t) ■ \e\ . 



If 7 £ / then (*) follows from lemma 3. Let i ^ I, that 
is Ai = e in the symbolic execution. Inspecting the 



e — e\\e2- Let t\ and t2 be the number of steps to 
evaluate |[ei|^ and |e2]]r) respectively. Then 

te <ii +i2 + ||Iei|„|-H ||e2l„ 

<2-(ii+i2) <0(t)-(|ei| + |e2|) <0(i)-|e|. 

The cases e — e' {eo, ei} and e = len(e') are proved 
analogously to the case e = ei|e2. ■ 



Restatement of theorem 1 There exists a fixed poly- 
nomial p such that if Pi, . . . ,Pn are CVM processes and for 
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each i Pi := |-Pi]s 7^ -L then for any IML process Pe, any 
trace property p, and resource bound t G N 

msec(|PE[Pi,...,P„]lc/,P,i) 

<msec(|PB[Pi,...,Pn]]/,p,p(i)). 

Proof By lemma 4 there exists a polynomial pi such that 
{Pile <pi lP.\i for each i. By lemma 1 the PTS IPe\i 
is a PTS with holes identifiable in p2-time for some fixed 
polynomial p2. Applying definition 7 and theorem 5 we see 
that there exists a polynomial pz depending only on p\ and 
P2 (and thus fixed) such that 

<vAPeUIP4i,---AP4i] 

= IPB[Pl,...,Pn]l/. 

By theorem 4 there exists a polynomial p4 depending only on 
P3 (and thus fixed) such that theorem 1 holds with p — Pi-u 

G. VERIFICATION OF IML— DETAILS 

We show how to simplify IML to the applied pi calculus 
that can be verified using ProVerif. As ProVerif works in the 
symbolic model, we shall employ a computational soundness 
result fi-om [4] to justify its use. The result will guarantee 
that if ProVerif successfully verifies the translated pi calcu- 
lus process then the process is asymptotically secure in our 
computational model. We start by illustrating the method 
on an example and then give a general description. 

The main challenge when translating IML to the pi cal- 
culus is that IML processes contain bitstring manipulation 
primitives that are not valid in pi. An example of such a 
process is shown in fig. 17 — it is an adapted excerpt from an 
IML model of the Needham-Shroeder-Lowe protocol imple- 
mentation used in one of our experiments (the full model is 
shown in appendix H). The key observation is that the bit- 
string manipulation expressions in IML are most commonly 
employed to provide the tupling functionality. In our exam- 
ple the process A uses concatenations to construct a compu- 
tational representation of the pair of ua and pkA- Similarly, 
process B uses range expressions to extract the second ele- 
ment of the pair. The idea of the translation is thus to enrich 
Ops with encoding and parsing operations with meanings 
given by the bitstring manipulation expressions. This way 
we hide the direct bitstring manipulation inside new opaque 
operations. Of course, to obtain a soundness result we need 
to prove certain properties of the extracted operations to 
make sure that they correctly implement tupling. 

In our example we introduce new operations conc\ and 
parse2 with implementations given by 

A^oncAbiM) = rmsgl"|len(6i)|6i[62l, 

-'T-parse2 V^J — 

if l-^{h{iA, iN} +6 iN +6 i4 < len(6))] then _L else 
if h(&{*0, i4} = "msgl")l then _L else 
lh{iA +6 iN -ft &{i4, iN}, 

len(fe) -b J4 -6 iN ~t b{i4, iN}}j. 

In the implementation of the parsing expression we keep all 
the condition checks that are performed by the IML process 
before applying the parser. We follow the convention of IML 
that il and iO represent truth values of bitstrings. Using the 



(unA); {i>r); 

let 7711 = "msgl"| len(nyi)|nyi|pfcA in 
let ei = encrypt{pkx , nil) in 
out(ei); ... 



B 



in(ei); 

let nil = decrypt(sk B , ei) in 

if nil {a, iN} +i,iN +1,14: < len(mi ) then 

if nii{iO, a} = "msgl" then 

let xi = mi{J4 +i, iN +1, mi{iA, iN}, 

len(»ni) —5 i4 — ;, iN —f, Tni{iA, iN}} in 
if xi = pkx then ... 



Figure 17: An excerpt from the IML process for 
the NSL protocol. An expression len(. . .) produces a 
result of fixed length iN. 



{I'nA); ii>r); 

out(encrypt(pkx , conci(nyi,pfex))); 



B 



in(ei); 

let nil = decrypt{sk B , ei) in 
let xi = parse2{nii) in 
if xi = pkx then ... 



Figure 18: An excerpt from the pi calculus transla- 
tion for the NSL protocol. 



new operations we can simplify our example IML process 
to the pi calculus process shown in fig. 18, removing the if- 
statements that have been absorbed into the implementation 
of par 362. 

The syntax of the applied pi calculus is shown in fig. 19. 
It is a strict subset of the IML syntax with the following 
differences: 

• The bitstring operations are no longer available. 

• The only allowed form of the restriction operator is (Dx) 
with the same meaning as described in section 4. 

• Parameters of events are restricted to be fixed bitstrings. 
This is a limitation of the result in [4] . 

• The conditional expression of IML with truth meanings 
for bitstrings iO and il is no longer available. Instead 
we can use let expressions to conditionally choose based 
on equality of bitstrings by assuming that there exists 
an operation eq £ Ops such that Aeq {b, b) = b and 
Aegib,b') = _L for allfe/fe'. 

• The input and output expressions only accept variables 
as parameters — all computations must be performed in 
let-expressions. 

The calculus shown in fig. 19 is a restricted version of 
the pi calculus presented in [4], as we do not need the full 
generality used there. Our restrictions are as follows: 

• There is only one public communication channel. 

• We do not make a distinction between variables and 
names, as they behave identically for the purpose of 
the computational execution. 
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b e BS, X £ Var, op S 


Ops 




e G PExp ::= 




expression 


X 




variable 


op(ei,...,en) 




constructor/destructor 


P, Q :: = 




process 







nil 


!P 




replication 


P\Q 




parallel composition 


(ux); P 




randomness 


m(x); P 




input 


out(x); P 




output 


event (fe); P 




event 


let X = e in P 


[else 


Q] evaluation 



Figure 19: The syntax of the applied pi calculus. 



[xljf = ri(x), for X G Var, 



[op(ei 



iop(fc,[ei]^,...,Ie„i;;). 



Figure 20: The evaluation of pi expressions, whereby 
_L propagates. 



• We only allow computations in let-expressions, so that 
we do not make a distinction between constructors and 
destructors in the syntax. 

Unlike CVM and IML which execute with regards to a 
fixed security parameter fco introduced in section 2, the com- 
putational semantics of the applied pi calculus is parame- 
terised by a security parameter. In order to achieve that 
we assume that the operations in Ops possess a generalised 
implementation A such that Aop : N x _BS^''(°p) ^ BS is the 
implementation of an operation op G Ops that takes the 
security parameter as the first argument. For a security 
parameter k and inputs m the value Aop{k,m) should be 
computable in time polynomial in fc -|- \rn\ . We require that 



-^opyko, 



Aop for each op G Ops. 



The semantics of the pi calculus is directly derived from 
the semantics of IML. Given a pi process P and a secu- 
rity parameter k, we define the semantics IP}^ as follows: 
The expression evaluation uses A instead of A as shown in 
fig. 20. The semantics rules are obtained from the IML rules 
(fig. 15) by substituting all expression evaluations |e|,, with 
[ejjj. The syntactic form (I'x) behaves as described in sec- 
tion 4, but now it is not a syntactic sugar anymore, so we 



bGBS, \b\ = k, r = A„o-nceik,b) 



(rj, ii>x);P) 



-)• {{v{^ ^ r}, P)} 



(pi-Nonce) 



add a new semantic rule shown in fig. 21. 

We now give details regarding the translation procedure 
from IML to pi. In the following we shall assume that the 
IML processes do not contain else-branches; this is true for 
the processes produced by the symbolic execution. Remov- 
ing if-statements from such processes does not reduce the 
set of traces and thus does not reduce insecurity. We shall 
therefore divide all if-statements into two groups: the cryp- 
tographic statements, that are likely to be relevant for the 
security of the process and should be kept in the translation, 
and the auxiliary statements that can be removed from the 
process without affecting security. The exact choice does 
not affect the soundness of the approach, but removing too 
many if statements might make the resulting pi process in- 
secure, and removing too few may prevent the successful 
translation from IML to pi. We use the following heuris- 
tic: an if-statement is considered to be cryptographic iff it 
is of the form if ei — 62 then P, where both ei and 62 are 
variables or applications of cryptographic operations. 

Given an IML process P we perform on it the following 
operations: 

• Introduce intermediate let-statements so that all out- 
statements only contain variables, all cryptographic if- 
statements are of the form if xi = X2 then P with vari- 
ables xi and X2 and every expression in the new let 
statements is of one of three types: 

— an encoding expression, that is, an expression con- 
taining only concrete bitstrings, len(), concatena- 
tions, arithmetic operations, and variables, 

— a parsing expression, that is, an expression contain- 
ing only concrete bitstrings, len(), substring extrac- 
tion, arithmetic operations, and a single variable, 

— a cryptographic expression, that is, an expression 
containing only variables and cryptographic oper- 
ations. 

As an example, the IML processes in fig. 17 are already 
written in such a form. 

• For each subprocess P' = (let y = e in P"), where e is 
an encoding expression with variables xi, . . . ,x„, add 
a new encoding operation c of arity n to Ops with the 
implementation given by 



^c(foi 



,b„) = le[bi/xi,...,b„/x„]j. 



Figure 21: Randomness generation in pi calculus. 



Now substitute P' by let y = c{xi, . . . , a;„) in P" . 
In order to justify modelling the encoding operations as 
tuples symbolically, we need to check that their compu- 
tational implementations fulfil certain conditions. The 
first condition is: 

(CI) the ranges of the functions Ac introduced above are 
disjoint. 

Checking the side conditions is described in appendix G.l. 

• For each subprocess P' = (let y = e in P"), where e is 
a parsing expression with a variable x, add a new pars- 
ing operation p of arity 1 to Ops. We need to check 
that before computing e the process P makes sure that 
X contains a result of a suitable encoding operation. 
More specifically, we check that there exists an encod- 
ing operation c such that the process rejects any x with 
the value outside the range of Ac and such that e com- 
putes an inverse of Ac. Let ei, . . . ,en be expressions 
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such that P contains an auxihary if-statement of the 
form if Ci then . . . above P' for some i. Let xi, . . . , Xm 
be the variables of P with exception of x and let 



j>p = 3xi, 



ei A . . . A e„ 



This way, whenever (r)' ,P') is an executing process in 
a protocol state reached by \P\i from some environ- 
ment T), we have [(/ip]],,' = il. We check the following 
conditions: 

(C2) there exists an encoding operation c such that for 

every h not in the range of Ac it is [(;6p[6/a;]] = iO. 

We say that c matches p, 
(C3) the function fp-. b >-^ |e[6/a;]| is an ith inverse of Ac 

for some i, that is, fp{Ac{bi, . . . ,b„)) — hi where n 

is the arity of c. 

Appendix G.l shows how to check the conditions (Cl)- 
(C3) and how a successful check results in a quantifier- 
free formula (j>'p with x as the only variable such that 
(pp implies (j)'p and the condition (C2) is still satisfied 
with (j)'p. Additionally (j>'p satisfies 

(C4) for the encoding operation c that matches p and 
any b in the range of Ac it is |[(;/>p[6/a::]]] — il. 

We define the computational implementation for p as 

Ap{b) = if I?ip[fe/a;]l then le[b/x]j else _L 
and substitute P' by let y = p{x) in P" . 

• Remove all auxiliary if-statements: for every such state- 
ment replace if e then P' by P' . Translate all crypto- 
graphic if-statements into the form expected by the pi- 
calculus: replace every occurrence of if II = X2 then P 
by let _ — eq{xi,X2) in P. 

If the process P does not contain any else-branches and 
the above procedure yields a valid pi process P then we say 
that P is translatable to P. A complete example of an IML 
program and its resulting pi calculus translation for the NSL 
protocol is shown in appendix H. 

In order to obtain the computational semantics for the 
translated process, we need to specify the generalised im- 
plementations Ac and Ap for the newly introduced encoders 
and parsers. We can assume any generalisation of these op- 
erations to arbitrary security parameters that satisfies the 
conditions (C1)-(C4). 

Clearly the translation preserves all the action sequences 
of the original process so the following holds: 

Lemma 5 There exists a fixed polynomial p such that for 
any IML process P translatable to a pi process P 

Applying theorem 4 we obtain a statement that links the 
security of the pi translation to the security of the original 
IML process: 

Restatement of theorem 2 There exists a fixed poly- 
nomial p such that for any IML process P translatable to a 
pi process P, any trace property p and resource bound f G N 

insec([Pli,p,t) < insec(IPl^,p,p(t)). 

Now that we have translated IML to pi, we can enumerate 
the conditions under which the resulting pi process can be 



soundly verified using ProVerif. For this purpose we shall 
make use of a computational soundness result from [4] , which 
places restrictions on the operation set Ops as well as on the 
shape of the pi process. More specifically, the computational 
soundness theorem is proved there for the set of construc- 
tors C = {_B/3, efc/1, dfc/1, pair/2} and destructors D = 
{D/2, isenc/1, isek/1, ekof/1, /si/1, snd/1, eg/2}. The re- 
sult includes soundness for signatures, but we omit them 
as they have not been used in our experiments so far. For 
simplicity the result presented here uses only one pairing 
construct (as in [4]), but it can be easily extended to an arbi- 
trary number of tupling constructors and destructors, to cor- 
respond to our encoding and parsing operations introduced 
during the translation from IML. The symbolic behaviour of 
the operations is defined by the following equations: 

D{dk{ti), E{ek{ti), m, t2)) = m, 

isenc{E(ek(ti) ,t2,t-i)) = E{ek{ti),t2,tz), 

isek{ek{t)) = ek{t), 
ekof{E{ek{ti),m,t2)) = efc(ti), 
fst{pair{x,y)) = x, 
snd(pair(x,y)) = y, 
eq{x,x) = X. 

Let Ops = C U D U {nonce}. The soundness conditions 
that the implementations A^ for x G Ops'^ need to satisfy 
are as follows: 

1. There are disjoint and efficiently computable sets of bit- 
strings representing the types nonces, ciphertexts, en- 
cryption keys, decryption keys, and pairs. Let NonceSk 
denote the set of all nonces for a security parameter k. 

2. Given b £ BS with \b\ = k chosen uniformly at random, 
Anonceik, fe) rctums r G Nonces^ uniformly at random. 

3. The functions Ae, Ack, Adk, and Apair are length- 
regular — the length of their result depends only on the 
lengths of their parameters. All m G Noncesk have the 
same length. 

4. Every image of Ae is of type ciphertext, every image of 
Ack and Ackof is of type encryption key, every image 
of Adk is of type decryption key. 

5. For all mi,m2 G BS we have Afst{Apairimi,m2)) = 
mi and Asnd{Apair{mi,m2)) = m2. Every m of type 
pair is in the range of Apair- If rn is not of type pair, 

Afst{m) = Asnd{m) = -L. 

6. Ackof {Ae(p,x, y)) = p for all p of type encryption key, 
X G BS, and a nonce y. Ackof (e) 7^ J- for any e of type 
ciphertext and Ackof (e) = J- for any e that is not of 
type ciphertext. 

7. Ae {p, m,y) — J- a p is not of type encryption key. 

8. ADiAdk{r),m) = ± if r G Noncesk and Ackof {m) 7^ 
Ack{r). 

9. AD{Adk{r),AE{Ackir),m,r')) = m for all r,r' G NonceSk- 

10. Aisck(x) = X for any a; of type encryption key. Aisck{x) — 
_L for any x not of type encryption key. 

11. Aisenc{x) = X for any x of type ciphertext. Aisenc{x) = 
_L for any x not of type ciphertext. 
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m, n ::= x \ pair(m, n) 

e ::= m \ isek{e) \ isenc{e) \ D(xj,e) \ fst(e) 
I snd{e) I ekof(e) \ eq{e,e) 

P, Q ::= out(a;); P \ in(x); P | | !P | (P|Q) | (ra); P 
I let X = e in P [else Q] \ event (6); P 
I (ur); let a; = ek{r) in let x^ = dk{r) in P 
I (Sr); let x = E{isek{Di}, D2,r) in P [else Q] 

Figure 22: The syntax of key-safe processes. 

12. We define an encryption sclieme {KeyGen, Enc, Dec) 
as follows: KeyGen picks a random r in Nonces k and 
returns (j4efe(r),j4tjfe (?"))• Enc{p,m) picks a random 
r in NonceSk and returns AE{p,m,r). Dec{k,c) re- 
turns j4_D(fc, c). We require that the defined encryption 
scheme is IND-CCA secure. 

13. For all e of type encryption key and m G BS the prob- 
ability that AE{e,m,r) = AE{e,ni,r') for uniformly 
chosen r, r' € Noncesk is negligible. 

The conditions on the pairing operations follow from the 
conditions (C1)-(C4) checked during the translation (length- 
regularity is fulfilled for any function given by an IML en- 
coding expression), the other conditions (in particular that 
the encryption is IND-CCA) shall be assumed, because we 
are treating cryptographic operations as black boxes and not 
trying to verify them. The condition that all functions have 
disjoint ranges is quite restrictive and is unlikely to be ful- 
filled in actual implementations. For this reason in future 
we would like to use CryptoVerif to verify our models, to 
bypass the need for complex soundness conditions. 

The soundness result of [4] is proved for a class of the so- 
called key-safe processes. In a nutshell, key-safe processes 
always use fresh randomness for encryption and key gen- 
eration and only use honestly generated (that is, through 
key generation) decryption keys for decryption. Decryption 
keys may not be sent around (in particular, this avoids the 
key-cycle problems). The grammar of key-safe processes is 
summarised in fig. 22. We let x, Xd, ks, and r stand for 
different sets of variables: general purpose, decryption key, 
signing key, and randomness variables. 

Lemma 6 (Computational soundness [4]) If a closed ke 
safe process symbolically satisfies a trace property p then it 
computationally satisfies p. n 

We now proceed to sketching out the proof of theorem 3 
from section 7. For a process P let Opsp be the set of 
operations used by P (including the nonce operation). The 
symbolic semantics and security of pi are defined in [4] . We 
do not detail the semantics here, as we only need to know 
that it is exactly the semantics that is used by ProVerif. 

A function / : N — )■ R is called negligible if for every c G N 
there exists no G N such that f{n) < 1/n'^ for all n > no. 

Restatement of theorem 3 Let P be a pi process such 
that OpSp C Ops^ and the soundness conditions are satis- 
fied. If P is key-safe and symbolically secure with respect to 
a trace property p then for every polynomial p the following 
function is negligible in k: 

insec(|P]|t, p, p{k)). 



(unA); {it); 

let m\ = "msgl"| len(n^)|n^|pfc^ in 
let ei = encrypt{pkx , nil) in 
out(ei); ... 
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in(ei); 

let mi = decrypt{sk B , ei) in 

if len{pkx) -hb iN +5 i20 +;, i4 = len(mi) then 

if mijiO, i4} = "msgl" then 

if mi{J4, iN} = i20 then 

let xi = mi{j4 -l-j, iN -I-;, mi{i4, iN], 

len(mi) — 5 lA — ;, iN — ;, mi{iA, iN}} in 
if xi = pkx then ... 



Figure 23: An excerpt from the IML process for the 
NSL protocol (full version). 



The main issue in the proof is to relate the notion of com- 
putational execution in [4] (their definition 18) to our notion 
of computational execution (definition 2). Both definitions 
are very similar. In [4] the state of the protocol consists of 
a single executing process together with valuations for vari- 
ables in the process. In each step the attacker chooses an 
execution context to specify which subprocess of the com- 
plete process is supposed to perform a reduction. In our 
definition the attacker interacts with a multiset of processes, 
selecting the process to be executed by an attached handle. 
It is easy to see that both definitions of the security game 
are equivalent. 

G.l Parsing Conditions 

We show how we check conditions (C1)-(C4) arising dur- 
ing the translation from IML to pi. The checks we perform 
are by no means complete (we might fail to detect that the 
conditions actually hold), but they are suitable for the pro- 
tocols that we encountered so far. We shall use the excerpt 
from the IML process of the NSL protocol shown in fig. 23 as 
an example (fig. 17 contained a slightly simplified version). 

For each encoding operation c and parsing operation p let 
Cc and Cp be the IML expressions that they replace. Let (f)p 
represent the set of facts that the IML process establishes 
before applying Cp, as described previously. 

To prove (CI) we check that all encoding expressions Cc 
contain a concrete bitstring (a tag) at the same positions 
and that all tags are different. In the example of fig. 23 the 
bitstring "msgl" would be such a tag, and we would expect 
other messages to contain tags like "msg2", "msg3", etc. 

To prove (C3) for an encoder c and a parser p we check 
that simplify j; {ep[ec/x\) = Xi, where x is the variable of Cp 
and Xi is one of the variables of Cc. As an example, for the 
operations conci and parse2 introduced at the beginning of 
appendix G, 

Cconci = "msgl"! len(xi)|a;i|a;2, 

Cparae^ = x{i4: +6 iN +(, x{iA, iN}, 

len(a;) — ;, i4 —b iN —b x{iA, iN}. 

Substituting Cconci for x in eparse2 "^^ obtain an expression 
that simplifies to X2, thus we know that eparse2 computes 
the second inverse of Cconci ■ 

Given a parser p and a candidate encoder c, we check 
whether c matches p (C2) as follows: first check that Cc 
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is a concatenation of expressions, each of which is either a 
variable (a concatenation parameter), a length of a variable, 
or a constant expression. Formally e^ is required to be of a 
form ei [ . . . |en, where {1, . . . ,n} = Ij:U IiU It such that for 
all i G /a; it is Ci = Xi for some variable Xi, for all i € /; it is 
Ci = \e.n(xi) for some j € I^ and for all i £ 7t it is e; = hi for 
some constant bitstring hi. We require that all variables and 
length expressions are distinct (no variable repeats twice) 
and that \Ix\ = j/;! + 1, that is, the expression Cc contains 
lengths for all parameters except one — the missing length 
can then be derived from knowing the total length of the 
concatenation. 

Given a bitstring &, in order to check that h is in the range 
of Ac, it is sufficient to check all the constant (tag) fields and 
to check that the sum of the length fields is consistent with 
the actual length of h. The following makes this precise. 

Given a parsing expression p^, we say that Pi extracts the 
ith field from ec if the following holds: for an expression e let 
ec[e/ei] be the expression obtained from ec by substituting 
ei with e. Then for a fresh variable x' 

simplifyj,{pi[ec[x' /ei]/x]) = x' , 

where E = Sop U {len(a;') = getLen(ei)}. 

Theorem 6 Let c and p be an encoding and a parsing ex- 
pression such that Cc is of a form ei| . . . |e„ with {1, . . . ,n} = 
Ix^Iil^h as described above. Assume that for each i £ IiUlt 
the formula (j)p contains a parsing expression pi as a term, 
such that Pi extracts the ith field from Cc. Let 



A 

ieit 



Pi^hi, 



<t)ien^'^Pi+ ^ getLen(ei) < len(a;). 
Then a bitstring b is in the range of Ac iff 

l(t>tag A 4>lenjx^b = Jl- n 

Proof (sketch) Let b e BS satisfy the premises of the 
theorem. For each i < n we obtain the length /i € N of the 
ith field in b as follows: for each i € Ii such that e^ — len(3;j) 
for some j € Ix let Ij = Ip^^/^^ll'*- For each i £ Ii U It let 
h — |[getLen(ei)]]'*. For the single i £ Ix such that len(a;i) 
is not one of the fields of Sc let h = \b\ ~ X^i^i 'i' Knowing 
the lengths allows us to split b into fields as follows: for each 
i < n let fei = fe{S!^i Ijj li}- This is well-defined according 
to 0!en- Clearly fe = 6i| . . . |fe„. We show that for each i it is 
bi = \e-i[bj/xj\j G Ix\\ as follows. 

• \i i £ Ix then e^ = Xi and the equality holds trivially. 

• If i £ J; then Ci = len(a::j) for some j £ Ix- By con- 
struction bi = hs{li) — bs(|&j|). 

• \i i £ It then the equality follows from iptag. 

Overall we have shown that b — lec[bj/xj \j £ Ixjj, so that b 
is in the range of Ac. ■ 



Thus checking (C2) reduces to finding appropriate parsers 
Pi among the terms of (jip and checking that (j)p h 4'tag/\4>ien- 
Furthermore, by choosing (fi'p = cjjtag A (fuen, we obtain a 
quantifier-free formula that satisfies (C2) and (C4), as re- 
quired by the translation. 

As an example, we can show that (C2) holds for conci 
and parse2 with respect to fig. 23 as follows: the conditions 



checked by the process B contain references to parsing ex- 
pressions mi{iO, i4} and mi{J4, iN}. We check that the first 
expressions extracts the first field (the tag) from econci and 
the second expression extracts the second field (the length 
of the first parameter). We then observe that the conditions 
checked by B imply 

(fitag = {mi{i0,i4} = "msgl"), 

<^icn = {iN +1, mi{J4, iN} +t i'i < len(mi)). 

Thus both the tag and the length consistency are properly 
checked. 

Our implementation currently checks all the conditions 
automatically except (j>p ^ 4>ien- The reason is that we are 
planning to use CryptoVerif as a verification backend and 
expect to be able to relax the parsing conditions there. 

H. NSL EXAMPLE CODE 

We show all the stages of the verification of the NSL ex- 
ample, discussed in section 8 

H.l Client Source 

The source code of the client is shown below. In our ex- 
ample A'^ = sizeof (size_t) = 8 and fco corresponds to 
SIZE_NONCE, which is set to be 20. 



^include <nct.h> 
#include <lib . h> 

^include <proxics /common . h> 

#include <string.h> 
#include <stdio.h> 

// #define LOWKATTACK 

int main(int argc , char ** argv) 

{ 

unsigned char * pkey , * skey , * xkey ; 
size_t pkey_len , skey_len , xkey_len ; 

unsigned char * ml, * iiil_all; 
unsigned char * Na; 

size_t nil_len , nil_e_len , ml_all_len ; 

unsigned char * m2, * m2_e ; 
unsigned char * xNb ; 

sizo_t ni2_len , m2_e_len ; 
sizo_t m2_ll , m2_12 ; 

unsigned char * m3_e ; 
size_t m3_e_Ien ; 

unsigned char * p; 

// for encryption tags 

unsigned char * etag = malloc(4); 



BIO * bio = socket_conncct (); 

pkey = get_pkey (&pkey_len , 'A') 
skey = get_skey (&skey_len , 'A') 
xkey = get_xkey (&xkey_len , 'A') 

/* Send message 1 */ 



ml_len = SIZELNONCE -1-4-1- pkey_Ien 

-I- sizeof ( size_t ) ; 
p = ml = malloc ( ml_Ien ) ; 
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meincpy(p, "msgl", 4); 
P += 4; 

* (size_t *) p = SIZE_NONCE; 
p += sizeof( size_t ) ; 

Na = p; 

nonce (Na) ; 

p += SIZE_NONCE; 

menicpy(p, pkey , pkey_len); 

nil_e_len = encrypt_len (xkey , xkey_len , 

ml, ml_len ) ; 
ml_all_len = nil_e_len + sizeof ( size_t ) + 4; 
nil_all = malloc ( nil_all_len ) ; 
nienicpy( nil_all , "encr", 4); 
nil_e_len = 

encrypt (xkey , xkey_len , ml, 
ml_len , 

ml_all + sizeof ( ml_e_len ) + 4); 
ml_all_len = ml_e_len + sizeof ( size_t ) + 4; 

* (size_t *) (ml_all + 4) = ml_e_len; 

send(bio, ml_all , ml_all_len); 

/* Receive message 2 */ 

recv(bio, etag , 4); 

recv(bio, (unsigned char*) &m2_e_len , 

sizeof ( m2_e_len ) ) ; 
m2_e = malloc ( m2_e_len ) ; 
recv(bio, m2_e , m2_e_len ) ; 

m2_len = decrypt_len (skey , skey_len , 

m2_e , m2_e_len); 
m2 = malloc ( m2_len ) ; 
m2_len = 

decrypt (skey , skey_Ien , 

ni2_e , m2_e_Ien , m2); 

if(xkey_len + 2 * SIZE_NONCE 

+ 2 * sizeof ( size_t ) + 4 != m2_len) 



} 



print f ( "A: ^m2^has^ wrongs length\n" ) ; 
exit (1); 



i f (mcmcmp(m2, "msg2" , 4)) 



{ 



print f ( "A: ^m2^not^ properly ^tagged \n" ) ; 
exit (1); 



m2_ll = *(size_t *) (m2 + 4); 

m2_12 = *(size_t *) (m2 + 4 + sizeof ( size_t )); 

if(m2_ll != SIZE_NONCE) 



{ 



} 



printf ( "A: ^m2^has^wrong^length^for^xNa\n" ) ; 
exit (1); 



if(m2_12 != SIZE_NONCE) 



{ 



} 



printf ( "A: ^m2^has^wrong^lengtii^for^xNb\n" ) ; 
exit (1); 



if (memcmp(ni2 + 4 + 2 * sizeof ( size_t ) , 
Na, m2_ll)) 

{ 

print f ( "A: ^xNa^in^m2^doesn ' t^match^Na\n" ) ; 
exit (1); 
} 



#ifndef LOWELATTACK 

if (memcmp(m2 + m2_ll + m2_12 

+ 2 * sizeof ( size_t ) + 4, 
xkey, xkey_len)) 

{ 

print f ( "A: ^x_xkey^in^m2^docsn ' t^matcii^xkey \n" ) ; 
exit (1); 

} 
#endif 

xNb = m2 + m2_ll + 2 * sizeof ( size_t ) + 4; 

/* Send message 3 */ 

m3_e_Ien = encrypt_len (xkey , xkey_len , 

xNb, m2_12); 
m3_e = malloc ( m3_e_len + sizeof ( size_t ) + 4); 
memcpy(ni3_e , "encr", 4); 
m3_e_Ien = 

encrypt (xkey , xkey_len , xNb, 
m2_I2 , 

m3_e + sizeof ( m3_e_Ien ) + 4); 
* (size_t *)(m3_e + 4) = m3_e_Ien ; 

send ( bio , m3_e , 

m3_e_len + sizeof ( ni3_e_len ) + 4); 



return 0; 



} 



H.2 Proxy Functions 

We show examples of proxy functions that replace calls to 
nonce, encrypt, etc. in the symbolic execution. Each func- 
tion starts by calling the actual function that it replaces 
so that the concrete execution can proceed as usual — recall 
that we observe a run of the program in order to identify 
the main path. The proxy functions then call the special 
symbolic interface functions to create new symbolic values 
and place them in memory. These symbolic interface func- 
tions are interpreted specially by the symbolic execution and 
perform the following actions: 

• load_buf( const unsigned char * buf , 

size_t len , const char * hint) 

Retrieves from memory the expression located at buf 
of length len and places it on the stack. The value 
hint is attached to the expression for naming purposes. 
For instance, the names of variables in the IML model 
shown in appendix H.3 are derived from hints. 

• store_buf (const unsigned char * buf) 

Takes an expression from the stack and stores it in the 
location in memory pointed to by buf. 

• syniL( const char * sym , const char * hint 

size_t len, int deterministic) 

Applies the operation sym to all the expressions on 
the stack as parameters. Sets the length of the new 
expression to be equal to len. The last parameter 
can be used to specify that the application is non- 
deterministic, that is, conceptually it takes an extra 
random argument, without having to specify that ar- 
gument explicitly. Calls to this function are also used 
to model random variable generation. For instance. 
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the symbol nonce created in nonce_proxy is treated 
specially and translates to the u operator of IML. 

• symN( const char * sym , const char * hint , 
size_t * len , int deterministic) 

Behaves like symL, but instead of assigning a known 
length to the new expression e, keeps its length unre- 
stricted and writes len(e) into len. 

The proxy functions are trusted to represent the true 
behaviour of the actual cryptographic operations. For in- 
stance, the function encrypt is supposed to check the well- 
formedness of the key (corresponding to the symbolic opera- 
tion isek). The actual cryptographic functions are required 
to satisfy the conditions listed in appendix G for the sound- 
ness result to hold. 

void nonce_proxy { unsigned cliar * N) 

{ 

nonce (N) ; 

symL("nonce" , "nonce", SIZE_NONCE , FALSE); 
store.buf (N); 
} 

size_t encrypt_len_proxy (unsigned char * key, 

sizc_t kcylen , 
unsigned char * in , 

size_t inlen ) 

{ 

size_t ret = 

encrypt_len (key , keylen, in, inlen); 

symL ("cncrypt_len" , "len" , sizeof(rct ) , FALSE ) ; 
store_buf (&ret ) ; 

if (ret < 0) exit (1); 

return ret ; 

} 

size_t encrypt_proxy (unsigned char * key, 
size_t keylen , 
unsigned char * in , 
size_t inlen , 
unsigned char * out ) 

{ 

size_t ret = 

encrypt (key , keylen , in , inlen , out ) ; 

unsigned char nonce [SIZE_NONCE ] ; 

noncc_proxy ( nonce ) ; 

load_buf ( key , keylen, "key"); 
symN("isck", "key", NULL, TRUE); 
load_buf(in, inlen, "msg" ) ; 
load_buf( nonce , SIZE_NONCE , "nonce"); 
symN("E", "cipher", &ret , TRUE); 
store_buf(out ); 

if (ret > encrypt_len_proxy ( key , keylen , 

in , inlen ) ) 
fail ("encrypt_proxy : ^bad^ length " ) ; 

return ret ; 
} 

unsigned char * get_pkey_proxy ( size_t * len, 

char side ) 
{ 



unsigned char * ret = 

char name[] = "pkX" ; 
name [21 = side ; 



2;et_pkey(len, side); 



rcadcnv ( ret , len , name ) ; 

return ret ; 

} 

H.3 IML Model 

The IML model extracted from both the client and the 
server is shown below. The notation e{l) is a shorthand 
for "e such that len(e) = V. For instance, in(c, varl<8>); 
means in(c, varl); if len(varl) = 8 then. 

The model contains several castToInt expressions. These 
result from the fact that the implementation uses size_t as 
the length type, but the OpenSSL functions that we call use 
int. These type conversions are recorded during the sym- 
bolic execution. For now we assume no numeric overflows, 
as mentioned in section 8, so the casts are removed before 
translating to pi. 

let A = 

new noncel<20>; 

let msgl = 6d736731|i20|noncel|pkA in 

new nonce2<20>; 

let cipherl = E(isek(pkX), msgl, nonce2) in 

let msg2 = 656e6372|len(cipherl)<8>|cipherl in 

out(c, nisg2); 

in(c, msg3<8>); 

let varl = (msg3 castToInt TSBase(int )) 

castToInt TSBase(unsigned long ) in 
in(c, nisg4<varl>); 
let msgS = D(skA, nisg4) in 

if len(pkX)<8> -I- i40 -I- il6 -|- i4 = len(msg5)<8> then 
if msg5{0, 4} = 6d736732 then 
if msg5{4, 8} = i20 then 
if msg5{12, 8} = i20 then 
let var2 = nisg5{20, nisg5{4, 8}} in 
if var2 = noncel then 
let var3 = 

msg5{msg5{4, 8} -f^ msg5{12, 8} + il6 -I- i4, 

len(msg5) - (msg5{4, 8} + msg5{12, 8} -I- il6 + i4)} in 
if var3 = pkX then 

let msg6 = nisg5{msg5{4, 8} -f il6 -I- i4, msg5{12, 8}} in 
new nonce3<20>; 

let cipher2 = E(isek(pkX), msg6, nonce3) in 
let msg7 = 656e6372|len(cipher2)<8>|cipher2 in 
out(c, nisg7); 0. 

let B = 

in(c, msgl<8>); 

let varl = (msgl castToInt TSBase(int )) 

castToInt TSBase(unsigned long ) in 
in(c, nisg2<varl>); 
let msg3 = D(skB, msg2) in 

if len(pkX)<8> -I- i8 -|- 120 -|- i4 = len(msg3)<8> then 
if msg3{0, 4} = 6d736731 then 
if msg3{4, 8} = i20 then 
let var2 = msg3{i8 + msg3{4, 8} 4- 14, 

len(msg3) - (18 + msg3{4, 8} + i4)} in 
if var2 = pkX then 
let var3 = nisg3{4, 8} in 
let var4 = nisg3{12, nisg3{4, 8}} in 
new noncel<20>; 

let msg4 = 6d736732|var3|i20|var4|noncel|pkB in 
new nonce2<20>; 

let cipherl = E(isek(pkX), msg4, nonce2) in 
let msgS = 656e6372|len(cipherl)<8>|cipherl in 
out(c, nisg5); 
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in(c, nisg6<8>); 

let var5 = {nisg6 castToInt TSBase(int )) 

castToInt TSBase(unsigned long ) in 
in(c, nisg7<var5>); 
let msgS = D(skB, msg7) in 
if len(msg8)<8> = i20 then 
if msgS = noncel then 
event endB(); 0. 

H.4 ProVerif Model 

The ProVerif model resulting from the translation of the 
IML process is shown below. The processes A and B as well 
as the symbolic rules for the new encoding and parsing ex- 
pressions conci and parsci are generated automatically from 
the source IML process. The rules for encryption and de- 
cryption, the query, and the environment process (including 
A' and B') are specified by hand. 

The events are used without parameters — this is a limita- 
tion of the result in [4] , but our symbolic execution as well 
as ProVerif can easily deal with parameterised events. The 
modelling is similar to [13, 4]. There the client A' executes 
an event beginAQ only if it is supposed to talk to B and B' 
executes an event endBQ only if it supposed to talk to A. 
The event endBQ is executed at the end, so conceptually 
B' needs to execute 

if pkX — pkA then B; event endB(). else B. 

Unfortunately, B; event endB(). does not form a valid pro- 
cess, so we use an equivalent formulation using an event 
notA() instead — endBQ is always executed, but it is counted 
only if notAQ has not been executed. 

The meaning of if-statements in pi is different from their 
meaning in IML. A pi calculus statement if ei = 62 then P 
corresponds to the IML let _ — eq{ei, 62) in P. 

free c. 
fun ek/1. 
fun dk/1. 
fun E/3. 
reduc 

D(dk(a), E(ek(a), x, r)) = x. 
reduc 

isek(ek(a)) = ek(a). 



data conc2/2. 
data conc5/3. 
data concll/1. 

reduc 

parse2(conc5(x0, xl, x2)) = xO. 
reduc 

parses (conc5(x0, xl, x2)) = x2. 
reduc 

parse4(conc5(x0, xl, x2)) = xl. 
reduc 

parse6(conc2(x0, xl)) = xl. 
reduc 

parse7(conc2(x0, xl)) = xO. 



new noncel; 
new nonce2; 
let varl = 

concll(E(isek(pkX), conc2(noncel, pkA), nonce2)) in 
out(c, varl); 
in(c, msgl); 
in(c, var2); 

let varS = parse2(D(skA, var2)) in 
if varS = noncel then 
let var4 = parse3(D(skA, var2)) in 
if var4 = pkX then 
new nonceS; 
let var5 = 

concll(E(isek(pkX), parse4(D(skA, var2)), nonceS)) in 
out(c, var5); 0. 

let B = 

in(c, nisg2); 

in(c, var27); 

let var28 = parse6(D(skB, var27)) in 

if var28 = pkX then 

new nonce4; 

new nonce5; 

let var29 = 

concll(E(isek(pkX), 

conc5(parse7(D(skB, var27)), nonce4, pkB), 

nonce5)) in 
out(c, var29); 
in(c, msgS); 
in(c, varSO); 

let varSl = D(skB, varSO) in 
if varSl = nonce4 then 
event cndB(); 0. 



let A' 
in(c, 



pkX); 



if pkX = pkB then 
event beginA(); A 
else A. 



let B' 

in(c, 



pkX); 



if pkX = pkA then B else 
event notA(); B. 

process 
I 

new A; new B; 
let pkA = ek(A) in 
let skA = dk{A) in 
let pkB = ok(B) in 
let skB = dk(B) in 
out(c, pkA); out{c, pkB); 
(!A' I !B') 



query 

ev:endB() ==> ev:beginA() | ev:notA(). 

query 

ev:endB() ==> ev:notA(). 

let A = 



29 



